We present a methodology for parallel acceleration of learning in the presence of matrix orthogonality and unitarity constraints of interest in several branches of machine learning. We show how an apparently sequential elementary rotation parametrization can be restructured into blocks of commutative operations using a well-known tool for coloring the edges of complete graphs, in turn widely applied to schedule round-robin (all-against-all) sports tournaments. The resulting decomposition admits an algorithm to compute a fully-parametrized orthogonal matrix from its rotation parameters in $O(n)$ sequential steps and one to compute the gradient of a training loss with respect to its parameters in $O(n\log n)$ steps. We discuss parametric restrictions of interest to generative modeling and present promising performance results with a prototype GPU implementation.
We present a novel framework for performing statistical sampling, expectation estimation, and partition function approximation using \emph{arbitrary} heuristic stochastic processes defined over discrete state spaces. Using a highly parallel construction we call the \emph{sequential constraining process}, we are able to simultaneously generate states with the heuristic process and accurately estimate their probabilities, even when they are far too small to be realistically inferred by direct counting. After showing that both theoretically correct importance sampling and Markov chain Monte Carlo are possible using the sequential constraining process, we integrate it into a methodology called \emph{state space sampling}, extending the ideas of state space search from computer science to the sampling context. The methodology comprises a dynamic data structure that constructs a robust Bayesian model of the statistics generated by the heuristic process subject to an accuracy constraint, the posterior Kullback-Leibler divergence. Sampling from the dynamic structure will generally yield partial states, which are completed by recursively calling the heuristic to refine the structure and resuming the sampling. Our experiments on various Ising models suggest that state space sampling enables heuristic state generation with accurate probability estimates, demonstrated by illustrating the convergence of a simulated annealing process to the Boltzmann distribution with increasing run length. Consequently, heretofore unprecedented direct importance sampling using the \emph{final} (marginal) distribution of a generic stochastic process is allowed, potentially augmenting the range of algorithms at the Monte Carlo practitioner's disposal.
We present new MCMC algorithms for computing the posterior distributions and expectations of the unknown variables in undirected graphical models with regular structure. For demonstration purposes, we focus on Markov Random Fields (MRFs). By partitioning the MRFs into non-overlapping trees, it is possible to compute the posterior distribution of a particular tree exactly by conditioning on the remaining tree. These exact solutions allow us to construct efficient blocked and Rao-Blackwellised MCMC algorithms. We show empirically that tree sampling is considerably more efficient than other partitioned sampling schemes and the naive Gibbs sampler, even in cases where loopy belief propagation fails to converge. We prove that tree sampling exhibits lower variance than the naive Gibbs sampler and other naive partitioning schemes using the theoretical measure of maximal correlation. We also construct new information theory tools for comparing different MCMC schemes and show that, under these, tree sampling is more efficient.
We propose a new Monte Carlo algorithm for complex discrete distributions. The algorithm is motivated by the N-Fold Way, which is an ingenious event-driven MCMC sampler that avoids rejection moves at any specific state. The N-Fold Way can however get "trapped" in cycles. We surmount this problem by modifying the sampling process. This correction does introduce bias, but the bias is subsequently corrected with a carefully engineered importance sampler.
This paper addresses the problem of sampling from binary distributions with constraints. In particular, it proposes an MCMC method to draw samples from a distribution of the set of all states at a specified distance from some reference state. For example, when the reference state is the vector of zeros, the algorithm can draw samples from a binary distribution with a constraint on the number of active variables, say the number of 1's. We motivate the need for this algorithm with examples from statistical physics and probabilistic inference. Unlike previous algorithms proposed to sample from binary distributions with these constraints, the new algorithm allows for large moves in state space and tends to propose them such that they are energetically favourable. The algorithm is demonstrated on three Boltzmann machines of varying difficulty: A ferromagnetic Ising model (with positive potentials), a restricted Boltzmann machine with learned Gabor-like filters as potentials, and a challenging three-dimensional spin-glass (with positive and negative potentials).
This paper introduces a new specialized algorithm for equilibrium Monte Carlo sampling of binary-valued systems, which allows for large moves in the state space. This is achieved by constructing self-avoiding walks (SAWs) in the state space. As a consequence, many bits are flipped in a single MCMC step. We name the algorithm SARDONICS, an acronym for Self-Avoiding Random Dynamics on Integer Complex Systems. The algorithm has several free parameters, but we show that Bayesian optimization can be used to automatically tune them. SARDONICS performs remarkably well in a broad number of sampling tasks: toroidal ferromagnetic and frustrated Ising models, 3D Ising models, restricted Boltzmann machines and chimera graphs arising in the design of quantum computers.
This paper proposes a new randomized strategy for adaptive MCMC using Bayesian optimization. This approach applies to non-differentiable objective functions and trades off exploration and exploitation to reduce the number of potentially costly objective function evaluations. We demonstrate the strategy in the complex setting of sampling from constrained, discrete and densely connected probabilistic graphical models where, for each variation of the problem, one needs to adjust the parameters of the proposal mechanism automatically to ensure efficient mixing of the Markov chains.