Mondrian Forests: Efficient Online Random Forests

Feb 16, 2015

Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh

Feb 16, 2015

Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh

**Click to Read Paper**

**Click to Read Paper**

Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes

Feb 14, 2018

Hyunjik Kim, Yee Whye Teh

Feb 14, 2018

Hyunjik Kim, Yee Whye Teh

**Click to Read Paper**

**Click to Read Paper**

Inferring ground truth from multi-annotator ordinal data: a probabilistic approach

Apr 30, 2013

Balaji Lakshminarayanan, Yee Whye Teh

Apr 30, 2013

Balaji Lakshminarayanan, Yee Whye Teh

**Click to Read Paper**

Belief Optimization for Binary Networks: A Stable Alternative to Loopy Belief Propagation

Jan 10, 2013

Max Welling, Yee Whye Teh

Jan 10, 2013

Max Welling, Yee Whye Teh

**Click to Read Paper**

**Click to Read Paper**

A Fast and Simple Algorithm for Training Neural Probabilistic Language Models

Jun 27, 2012

Andriy Mnih, Yee Whye Teh

In spite of their superior performance, neural probabilistic language models (NPLMs) remain far less widely used than n-gram models due to their notoriously long training times, which are measured in weeks even for moderately-sized datasets. Training NPLMs is computationally expensive because they are explicitly normalized, which leads to having to consider all words in the vocabulary when computing the log-likelihood gradients. We propose a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions. We investigate the behaviour of the algorithm on the Penn Treebank corpus and show that it reduces the training times by more than an order of magnitude without affecting the quality of the resulting models. The algorithm is also more efficient and much more stable than importance sampling because it requires far fewer noise samples to perform well. We demonstrate the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary, obtaining state-of-the-art results on the Microsoft Research Sentence Completion Challenge dataset.
Jun 27, 2012

Andriy Mnih, Yee Whye Teh

**Click to Read Paper**

Fast MCMC sampling for Markov jump processes and continuous time Bayesian networks

Feb 14, 2012

Vinayak Rao, Yee Whye Teh

Feb 14, 2012

Vinayak Rao, Yee Whye Teh

**Click to Read Paper**

Learning Item Trees for Probabilistic Modelling of Implicit Feedback

Sep 27, 2011

Andriy Mnih, Yee Whye Teh

Sep 27, 2011

Andriy Mnih, Yee Whye Teh

**Click to Read Paper**

The Mondrian Kernel

Jun 16, 2016

Matej Balog, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, Yee Whye Teh

We introduce the Mondrian kernel, a fast random feature approximation to the Laplace kernel. It is suitable for both batch and online learning, and admits a fast kernel-width-selection procedure as the random features can be re-used efficiently for all kernel widths. The features are constructed by sampling trees via a Mondrian process [Roy and Teh, 2009], and we highlight the connection to Mondrian forests [Lakshminarayanan et al., 2014], where trees are also sampled via a Mondrian process, but fit independently. This link provides a new insight into the relationship between kernel methods and random forests.
Jun 16, 2016

Matej Balog, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, Yee Whye Teh

**Click to Read Paper**

A nonparametric HMM for genetic imputation and coalescent inference

Nov 02, 2016

Lloyd T. Elliott, Yee Whye Teh

Genetic sequence data are well described by hidden Markov models (HMMs) in which latent states correspond to clusters of similar mutation patterns. Theory from statistical genetics suggests that these HMMs are nonhomogeneous (their transition probabilities vary along the chromosome) and have large support for self transitions. We develop a new nonparametric model of genetic sequence data, based on the hierarchical Dirichlet process, which supports these self transitions and nonhomogeneity. Our model provides a parameterization of the genetic process that is more parsimonious than other more general nonparametric models which have previously been applied to population genetics. We provide truncation-free MCMC inference for our model using a new auxiliary sampling scheme for Bayesian nonparametric HMMs. In a series of experiments on male X chromosome data from the Thousand Genomes Project and also on data simulated from a population bottleneck we show the benefits of our model over the popular finite model fastPHASE, which can itself be seen as a parametric truncation of our model. We find that the number of HMM states found by our model is correlated with the time to the most recent common ancestor in population bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics applied to large and complex genetic data.
Nov 02, 2016

Lloyd T. Elliott, Yee Whye Teh

**Click to Read Paper**

Discovering Multiple Constraints that are Frequently Approximately Satisfied

Jan 10, 2013

Geoffrey E. Hinton, Yee Whye Teh

Some high-dimensional data.sets can be modelled by assuming that there are many different linear constraints, each of which is Frequently Approximately Satisfied (FAS) by the data. The probability of a data vector under the model is then proportional to the product of the probabilities of its constraint violations. We describe three methods of learning products of constraints using a heavy-tailed probability distribution for the violations.
Jan 10, 2013

Geoffrey E. Hinton, Yee Whye Teh

**Click to Read Paper**

Modelling sparsity, heterogeneity, reciprocity and community structure in temporal interaction data

Oct 26, 2018

Xenia Miscouridou, François Caron, Yee Whye Teh

We propose a novel class of network models for temporal dyadic interaction data. Our goal is to capture a number of important features often observed in social interactions: sparsity, degree heterogeneity, community structure and reciprocity. We propose a family of models based on self-exciting Hawkes point processes in which events depend on the history of the process. The key component is the conditional intensity function of the Hawkes Process, which captures the fact that interactions may arise as a response to past interactions (reciprocity), or due to shared interests between individuals (community structure). In order to capture the sparsity and degree heterogeneity, the base (non time dependent) part of the intensity function builds on compound random measures following Todeschini et al. (2016). We conduct experiments on a variety of real-world temporal interaction data and show that the proposed model outperforms many competing approaches for link prediction, and leads to interpretable parameters.
Oct 26, 2018

Xenia Miscouridou, François Caron, Yee Whye Teh

**Click to Read Paper**

Causal Inference via Kernel Deviance Measures

Apr 12, 2018

Jovana Mitrovic, Dino Sejdinovic, Yee Whye Teh

Apr 12, 2018

Jovana Mitrovic, Dino Sejdinovic, Yee Whye Teh

**Click to Read Paper**

Poisson intensity estimation with reproducing kernels

Jun 26, 2017

Seth Flaxman, Yee Whye Teh, Dino Sejdinovic

Despite the fundamental nature of the inhomogeneous Poisson process in the theory and application of stochastic processes, and its attractive generalizations (e.g. Cox process), few tractable nonparametric modeling approaches of intensity functions exist, especially when observed points lie in a high-dimensional space. In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) formulation for the inhomogeneous Poisson process. We model the square root of the intensity as an RKHS function. Whereas RKHS models used in supervised learning rely on the so-called representer theorem, the form of the inhomogeneous Poisson process likelihood means that the representer theorem does not apply. However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite-dimensional problem. The resulting approach is simple to implement, and readily scales to high dimensions and large-scale datasets.
Jun 26, 2017

Seth Flaxman, Yee Whye Teh, Dino Sejdinovic

**Click to Read Paper**

Gaussian Processes for Survival Analysis

Nov 02, 2016

Tamara Fernández, Nicolás Rivera, Yee Whye Teh

We introduce a semi-parametric Bayesian model for survival analysis. The model is centred on a parametric baseline hazard, and uses a Gaussian process to model variations away from it nonparametrically, as well as dependence on covariates. As opposed to many other methods in survival analysis, our framework does not impose unnecessary constraints in the hazard rate or in the survival function. Furthermore, our model handles left, right and interval censoring mechanisms common in survival analysis. We propose a MCMC algorithm to perform inference and an approximation scheme based on random Fourier features to make computations faster. We report experimental results on synthetic and real data, showing that our model performs better than competing models such as Cox proportional hazards, ANOVA-DDP and random survival forests.
Nov 02, 2016

Tamara Fernández, Nicolás Rivera, Yee Whye Teh

**Click to Read Paper**

Bayesian nonparametrics for Sparse Dynamic Networks

Jul 06, 2016

Konstantina Palla, Francois Caron, Yee Whye Teh

We propose a Bayesian nonparametric prior for time-varying networks. To each node of the network is associated a positive parameter, modeling the sociability of that node. Sociabilities are assumed to evolve over time, and are modeled via a dynamic point process model. The model is able to (a) capture smooth evolution of the interaction between nodes, allowing edges to appear/disappear over time (b) capture long term evolution of the sociabilities of the nodes (c) and yield sparse graphs, where the number of edges grows subquadratically with the number of nodes. The evolution of the sociabilities is described by a tractable time-varying gamma process. We provide some theoretical insights into the model and apply it to three real world datasets.
Jul 06, 2016

Konstantina Palla, Francois Caron, Yee Whye Teh

**Click to Read Paper**

DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression

Feb 15, 2016

Jovana Mitrovic, Dino Sejdinovic, Yee Whye Teh

Feb 15, 2016

Jovana Mitrovic, Dino Sejdinovic, Yee Whye Teh

**Click to Read Paper**

A marginal sampler for $σ$-Stable Poisson-Kingman mixture models

Sep 24, 2015

María Lomelí, Stefano Favaro, Yee Whye Teh

We investigate the class of $\sigma$-stable Poisson-Kingman random probability measures (RPMs) in the context of Bayesian nonparametric mixture modeling. This is a large class of discrete RPMs which encompasses most of the the popular discrete RPMs used in Bayesian nonparametrics, such as the Dirichlet process, Pitman-Yor process, the normalized inverse Gaussian process and the normalized generalized Gamma process. We show how certain sampling properties and marginal characterizations of $\sigma$-stable Poisson-Kingman RPMs can be usefully exploited for devising a Markov chain Monte Carlo (MCMC) algorithm for making inference in Bayesian nonparametric mixture modeling. Specifically, we introduce a novel and efficient MCMC sampling scheme in an augmented space that has a fixed number of auxiliary variables per iteration. We apply our sampling scheme for a density estimation and clustering tasks with unidimensional and multidimensional datasets, and we compare it against competing sampling schemes.
Sep 24, 2015

María Lomelí, Stefano Favaro, Yee Whye Teh

**Click to Read Paper**