Asymptotic Properties of Recursive Maximum Likelihood Estimation in Non-Linear State-Space Models

Aug 01, 2018

Vladislav Z. B. Tadic, Arnaud Doucet

Using stochastic gradient search and the optimal filter derivative, it is possible to perform recursive (i.e., online) maximum likelihood estimation in a non-linear state-space model. As the optimal filter and its derivative are analytically intractable for such a model, they need to be approximated numerically. In [Poyiadjis, Doucet and Singh, Biometrika 2018], a recursive maximum likelihood algorithm based on a particle approximation to the optimal filter derivative has been proposed and studied through numerical simulations. Here, this algorithm and its asymptotic behavior are analyzed theoretically. We show that the algorithm accurately estimates maxima to the underlying (average) log-likelihood when the number of particles is sufficiently large. We also derive (relatively) tight bounds on the estimation error. The obtained results hold under (relatively) mild conditions and cover several classes of non-linear state-space models met in practice.
Aug 01, 2018

Vladislav Z. B. Tadic, Arnaud Doucet

**Click to Read Paper**

Bayesian inference in the presence of an intractable likelihood function is computationally challenging. When following a Markov chain Monte Carlo (MCMC) approach to approximate the posterior distribution in this context, one typically either uses MCMC schemes which target the joint posterior of the parameters and some auxiliary latent variables or pseudo-marginal Metropolis-Hastings (MH) schemes which mimic a MH algorithm targeting the marginal posterior of the parameters by approximating unbiasedly the intractable likelihood. In scenarios where the parameters and auxiliary variables are strongly correlated under the posterior and/or this posterior is multimodal, Gibbs sampling or Hamiltonian Monte Carlo (HMC) will perform poorly and the pseudo-marginal MH algorithm, as any other MH scheme, will be inefficient for high dimensional parameters. We propose here an original MCMC algorithm, termed pseudo-marginal HMC, which approximates the HMC algorithm targeting the marginal posterior of the parameters. We demonstrate through experiments that pseudo-marginal HMC can outperform significantly both standard HMC and pseudo-marginal MH schemes.

**Click to Read Paper**
We present new algorithms to compute the mean of a set of empirical probability measures under the optimal transport metric. This mean, known as the Wasserstein barycenter, is the measure that minimizes the sum of its Wasserstein distances to each element in that set. We propose two original algorithms to compute Wasserstein barycenters that build upon the subgradient method. A direct implementation of these algorithms is, however, too costly because it would require the repeated resolution of large primal and dual optimal transport problems to compute subgradients. Extending the work of Cuturi (2013), we propose to smooth the Wasserstein distance used in the definition of Wasserstein barycenters with an entropic regularizer and recover in doing so a strictly convex objective whose gradients can be computed for a considerably cheaper computational cost using matrix scaling algorithms. We use these algorithms to visualize a large family of images and to solve a constrained clustering problem.

**Click to Read Paper**
We propose in this work a new family of kernels for variable-length time series. Our work builds upon the vector autoregressive (VAR) model for multivariate stochastic processes: given a multivariate time series x, we consider the likelihood function p_{\theta}(x) of different parameters \theta in the VAR model as features to describe x. To compare two time series x and x', we form the product of their features p_{\theta}(x) p_{\theta}(x') which is integrated out w.r.t \theta using a matrix normal-inverse Wishart prior. Among other properties, this kernel can be easily computed when the dimension d of the time series is much larger than the lengths of the considered time series x and x'. It can also be generalized to time series taking values in arbitrary state spaces, as long as the state space itself is endowed with a kernel \kappa. In that case, the kernel between x and x' is a a function of the Gram matrices produced by \kappa on observations and subsequences of observations enumerated in x and x'. We describe a computationally efficient implementation of this generalization that uses low-rank matrix factorization techniques. These kernels are compared to other known kernels using a set of benchmark classification tasks carried out with support vector machines.

**Click to Read Paper**
Efficient Bayesian Inference for Generalized Bradley-Terry Models

Nov 08, 2010

Francois Caron, Arnaud Doucet

The Bradley-Terry model is a popular approach to describe probabilities of the possible outcomes when elements of a set are repeatedly compared with one another in pairs. It has found many applications including animal behaviour, chess ranking and multiclass classification. Numerous extensions of the basic model have also been proposed in the literature including models with ties, multiple comparisons, group comparisons and random graphs. From a computational point of view, Hunter (2004) has proposed efficient iterative MM (minorization-maximization) algorithms to perform maximum likelihood estimation for these generalized Bradley-Terry models whereas Bayesian inference is typically performed using MCMC (Markov chain Monte Carlo) algorithms based on tailored Metropolis-Hastings (M-H) proposals. We show here that these MM\ algorithms can be reinterpreted as special instances of Expectation-Maximization (EM) algorithms associated to suitable sets of latent variables and propose some original extensions. These latent variables allow us to derive simple Gibbs samplers for Bayesian inference. We demonstrate experimentally the efficiency of these algorithms on a variety of applications.
Nov 08, 2010

Francois Caron, Arnaud Doucet

**Click to Read Paper**

The asymptotic behavior of the stochastic gradient algorithm with a biased gradient estimator is analyzed. Relying on arguments based on the dynamic system theory (chain-recurrence) and the differential geometry (Yomdin theorem and Lojasiewicz inequality), tight bounds on the asymptotic bias of the iterates generated by such an algorithm are derived. The obtained results hold under mild conditions and cover a broad class of high-dimensional nonlinear algorithms. Using these results, the asymptotic properties of the policy-gradient (reinforcement) learning and adaptive population Monte Carlo sampling are studied. Relying on the same results, the asymptotic behavior of the recursive maximum split-likelihood estimation in hidden Markov models is analyzed, too.

**Click to Read Paper**
On the Selection of Initialization and Activation Function for Deep Neural Networks

Oct 07, 2018

Soufiane Hayou, Arnaud Doucet, Judith Rousseau

Oct 07, 2018

Soufiane Hayou, Arnaud Doucet, Judith Rousseau

**Click to Read Paper**

On Markov chain Monte Carlo methods for tall data

May 11, 2015

RÃ©mi Bardenet, Arnaud Doucet, Chris Holmes

Markov chain Monte Carlo methods are often deemed too computationally intensive to be of any practical use for big data applications, and in particular for inference on datasets containing a large number $n$ of individual data points, also known as tall datasets. In scenarios where data are assumed independent, various approaches to scale up the Metropolis-Hastings algorithm in a Bayesian inference context have been recently proposed in machine learning and computational statistics. These approaches can be grouped into two categories: divide-and-conquer approaches and, subsampling-based algorithms. The aims of this article are as follows. First, we present a comprehensive review of the existing literature, commenting on the underlying assumptions and theoretical guarantees of each method. Second, by leveraging our understanding of these limitations, we propose an original subsampling-based approach which samples from a distribution provably close to the posterior distribution of interest, yet can require less than $O(n)$ data point likelihood evaluations at each iteration for certain statistical models in favourable scenarios. Finally, we have only been able so far to propose subsampling-based methods which display good performance in scenarios where the Bernstein-von Mises approximation of the target posterior distribution is excellent. It remains an open challenge to develop such methods in scenarios where the Bernstein-von Mises approximation is poor.
May 11, 2015

RÃ©mi Bardenet, Arnaud Doucet, Chris Holmes

**Click to Read Paper**

Sparsity-Promoting Bayesian Dynamic Linear Models

Mar 01, 2012

FranÃ§ois Caron, Luke Bornn, Arnaud Doucet

Sparsity-promoting priors have become increasingly popular over recent years due to an increased number of regression and classification applications involving a large number of predictors. In time series applications where observations are collected over time, it is often unrealistic to assume that the underlying sparsity pattern is fixed. We propose here an original class of flexible Bayesian linear models for dynamic sparsity modelling. The proposed class of models expands upon the existing Bayesian literature on sparse regression using generalized multivariate hyperbolic distributions. The properties of the models are explored through both analytic results and simulation studies. We demonstrate the model on a financial application where it is shown that it accurately represents the patterns seen in the analysis of stock and derivative data, and is able to detect major events by filtering an artificial portfolio of assets.
Mar 01, 2012

FranÃ§ois Caron, Luke Bornn, Arnaud Doucet

**Click to Read Paper**

Hamiltonian Variational Auto-Encoder

May 29, 2018

Anthony L. Caterini, Arnaud Doucet, Dino Sejdinovic

May 29, 2018

Anthony L. Caterini, Arnaud Doucet, Dino Sejdinovic

**Click to Read Paper**

**Click to Read Paper**

Reversible Jump MCMC Simulated Annealing for Neural Networks

Jan 16, 2013

Christophe Andrieu, Nando de Freitas, Arnaud Doucet

Jan 16, 2013

Christophe Andrieu, Nando de Freitas, Arnaud Doucet

**Click to Read Paper**

Toward Practical N2 Monte Carlo: the Marginal Particle Filter

Jul 04, 2012

Mike Klaas, Nando de Freitas, Arnaud Doucet

Jul 04, 2012

Mike Klaas, Nando de Freitas, Arnaud Doucet

**Click to Read Paper**

Bayesian nonparametric image segmentation using a generalized Swendsen-Wang algorithm

Feb 09, 2016

Richard Yi Da Xu, Francois Caron, Arnaud Doucet

Unsupervised image segmentation aims at clustering the set of pixels of an image into spatially homogeneous regions. We introduce here a class of Bayesian nonparametric models to address this problem. These models are based on a combination of a Potts-like spatial smoothness component and a prior on partitions which is used to control both the number and size of clusters. This class of models is flexible enough to include the standard Potts model and the more recent Potts-Dirichlet Process model \cite{Orbanz2008}. More importantly, any prior on partitions can be introduced to control the global clustering structure so that it is possible to penalize small or large clusters if necessary. Bayesian computation is carried out using an original generalized Swendsen-Wang algorithm. Experiments demonstrate that our method is competitive in terms of RAND\ index compared to popular image segmentation methods, such as mean-shift, and recent alternative Bayesian nonparametric models.
Feb 09, 2016

Richard Yi Da Xu, Francois Caron, Arnaud Doucet

**Click to Read Paper**

Asynchronous Anytime Sequential Monte Carlo

Jul 10, 2014

Brooks Paige, Frank Wood, Arnaud Doucet, Yee Whye Teh

We introduce a new sequential Monte Carlo algorithm we call the particle cascade. The particle cascade is an asynchronous, anytime alternative to traditional particle filtering algorithms. It uses no barrier synchronizations which leads to improved particle throughput and memory efficiency. It is an anytime algorithm in the sense that it can be run forever to emit an unbounded number of particles while keeping within a fixed memory budget. We prove that the particle cascade is an unbiased marginal likelihood estimator which means that it can be straightforwardly plugged into existing pseudomarginal methods.
Jul 10, 2014

Brooks Paige, Frank Wood, Arnaud Doucet, Yee Whye Teh

**Click to Read Paper**

Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks

Jan 16, 2013

Arnaud Doucet, Nando de Freitas, Kevin Murphy, Stuart Russell

Jan 16, 2013

Arnaud Doucet, Nando de Freitas, Kevin Murphy, Stuart Russell

**Click to Read Paper**

New inference strategies for solving Markov Decision Processes using reversible jump MCMC

May 09, 2012

Matthias Hoffman, Hendrik Kueck, Nando de Freitas, Arnaud Doucet

May 09, 2012

Matthias Hoffman, Hendrik Kueck, Nando de Freitas, Arnaud Doucet

**Click to Read Paper**

Hamiltonian Descent Methods

Sep 13, 2018

Chris J. Maddison, Daniel Paulin, Yee Whye Teh, Brendan O'Donoghue, Arnaud Doucet

Sep 13, 2018

Chris J. Maddison, Daniel Paulin, Yee Whye Teh, Brendan O'Donoghue, Arnaud Doucet

**Click to Read Paper**

Particle Value Functions

Mar 16, 2017

Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Arnaud Doucet, Andriy Mnih, Yee Whye Teh

Mar 16, 2017

Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Arnaud Doucet, Andriy Mnih, Yee Whye Teh

**Click to Read Paper**

Interacting Particle Markov Chain Monte Carlo

Apr 12, 2017

Tom Rainforth, Christian A. Naesseth, Fredrik Lindsten, Brooks Paige, Jan-Willem van de Meent, Arnaud Doucet, Frank Wood

We introduce interacting particle Markov chain Monte Carlo (iPMCMC), a PMCMC method based on an interacting pool of standard and conditional sequential Monte Carlo samplers. Like related methods, iPMCMC is a Markov chain Monte Carlo sampler on an extended space. We present empirical results that show significant improvements in mixing rates relative to both non-interacting PMCMC samplers, and a single PMCMC sampler with an equivalent memory and computational budget. An additional advantage of the iPMCMC method is that it is suitable for distributed and multi-core architectures.
Apr 12, 2017

Tom Rainforth, Christian A. Naesseth, Fredrik Lindsten, Brooks Paige, Jan-Willem van de Meent, Arnaud Doucet, Frank Wood

**Click to Read Paper**