Stein variational gradient descent (SVGD) is a deterministic sampling algorithm that iteratively transports a set of particles to approximate given distributions, based on an efficient gradient-based update that guarantees to optimally decrease the KL divergence within a function space. This paper develops the first theoretical analysis on SVGD, discussing its weak convergence properties and showing that its asymptotic behavior is captured by a gradient flow of the KL divergence functional under a new metric structure induced by Stein operator. We also provide a number of results on Stein operator and Stein's identity using the notion of weak derivative, including a new proof of the distinguishability of Stein discrepancy under weak conditions.

**Click to Read Paper and Get Code**
Learning to Draw Samples with Amortized Stein Variational Gradient Descent

Oct 30, 2017

Yihao Feng, Dilin Wang, Qiang Liu

We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference. Our method is based on iteratively adjusting the neural network parameters so that the output changes along a Stein variational gradient direction (Liu & Wang, 2016) that maximally decreases the KL divergence with the target distribution. Our method works for any target distribution specified by their unnormalized density function, and can train any black-box architectures that are differentiable in terms of the parameters we want to adapt. We demonstrate our method with a number of applications, including variational autoencoder (VAE) with expressive encoders to model complex latent space structures, and hyper-parameter learning of MCMC samplers that allows Bayesian inference to adaptively improve itself when seeing more data.
Oct 30, 2017

Yihao Feng, Dilin Wang, Qiang Liu

* Accepted by UAI 2017

**Click to Read Paper and Get Code**

Stein variational gradient descent (SVGD) is a non-parametric inference algorithm that evolves a set of particles to fit a given distribution of interest. We analyze the non-asymptotic properties of SVGD, showing that there exists a set of functions, which we call the Stein matching set, whose expectations are exactly estimated by any set of particles that satisfies the fixed point equation of SVGD. This set is the image of Stein operator applied on the feature maps of the positive definite kernel used in SVGD. Our results provide a theoretical framework for analyzing the properties of SVGD with different kernels, shedding insight into optimal kernel choice. In particular, we show that SVGD with linear kernels yields exact estimation of means and variances on Gaussian distributions, while random Fourier features enable probabilistic bounds for distributional approximation. Our results offer a refreshing view of the classical inference problem as fitting Stein's identity or solving the Stein equation, which may motivate more efficient algorithms.

* Conference on Neural Information Processing Systems (NIPS) 2018

* Conference on Neural Information Processing Systems (NIPS) 2018

**Click to Read Paper and Get Code*** ICML 2018

**Click to Read Paper and Get Code**

**Click to Read Paper and Get Code**

**Click to Read Paper and Get Code**

Learning Deep Energy Models: Contrastive Divergence vs. Amortized MLE

Jul 04, 2017

Qiang Liu, Dilin Wang

Jul 04, 2017

Qiang Liu, Dilin Wang

**Click to Read Paper and Get Code**

* This paper is about variance reduction on Monte Carol estimation of KL divergence, NIPS, 2016

**Click to Read Paper and Get Code**

Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning

Nov 26, 2016

Dilin Wang, Qiang Liu

Nov 26, 2016

Dilin Wang, Qiang Liu

* Under review as a conference paper at ICLR 2017

**Click to Read Paper and Get Code**

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Aug 19, 2016

Qiang Liu, Dilin Wang

We propose a general purpose variational inference algorithm that forms a natural counterpart of gradient descent for optimization. Our method iteratively transports a set of particles to match the target distribution, by applying a form of functional gradient descent that minimizes the KL divergence. Empirical studies are performed on various real world models and datasets, on which our method is competitive with existing state-of-the-art methods. The derivation of our method is based on a new theoretical result that connects the derivative of KL divergence under smooth transforms with Stein's identity and a recently proposed kernelized Stein discrepancy, which is of independent interest.
Aug 19, 2016

Qiang Liu, Dilin Wang

* To appear in NIPS 2016

**Click to Read Paper and Get Code**

**Click to Read Paper and Get Code**

Distributed Estimation, Information Loss and Exponential Families

Oct 09, 2014

Qiang Liu, Alexander Ihler

Oct 09, 2014

Qiang Liu, Alexander Ihler

* To appear in NIPS 2014

**Click to Read Paper and Get Code**

* This is a journal version of our conference paper "variational algorithms for marginal MAP" in UAI 201 [arXiv:1202.3742]; this version is considerably expanded, with more detail in its development, examples, algorithms, and proofs; additional experiments; and a junction graph version of the central message-passing algorithm

**Click to Read Paper and Get Code**

* Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

**Click to Read Paper and Get Code**

**Click to Read Paper and Get Code**

* Conference on Neural Information Processing Systems (NIPS) 2018

**Click to Read Paper and Get Code**

**Click to Read Paper and Get Code**

* Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

**Click to Read Paper and Get Code**

* Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

**Click to Read Paper and Get Code**