Data analysis and machine learning have become an integrative part of the modern scientific methodology, offering automated procedures for the prediction of a phenomenon based on past observations, unraveling underlying patterns in data and providing insights about the problem. Yet, caution should avoid using machine learning as a black-box tool, but rather consider it as a methodology, with a rational thought process that is entirely dependent on the problem under study. In particular, the use of algorithms should ideally require a reasonable understanding of their mechanisms, properties and limitations, in order to better apprehend and interpret their results. Accordingly, the goal of this thesis is to provide an in-depth analysis of random forests, consistently calling into question each and every part of the algorithm, in order to shed new light on its learning capabilities, inner workings and interpretability. The first part of this work studies the induction of decision trees and the construction of ensembles of randomized trees, motivating their design and purpose whenever possible. Our contributions follow with an original complexity analysis of random forests, showing their good computational performance and scalability, along with an in-depth discussion of their implementation details, as contributed within Scikit-Learn. In the second part of this work, we analyse and discuss the interpretability of random forests in the eyes of variable importance measures. The core of our contributions rests in the theoretical characterization of the Mean Decrease of Impurity variable importance measure, from which we prove and derive some of its properties in the case of multiway totally randomized trees and in asymptotic conditions. In consequence of this work, our analysis demonstrates that variable importances [...].

* PhD thesis. Source code available at https://github.com/glouppe/phd-thesis

* PhD thesis. Source code available at https://github.com/glouppe/phd-thesis

**Click to Read Paper**
Gradient Energy Matching for Distributed Asynchronous Gradient Descent

May 22, 2018

Joeri Hermans, Gilles Louppe

May 22, 2018

Joeri Hermans, Gilles Louppe

**Click to Read Paper**

Recurrent machines for likelihood-free inference

Nov 30, 2018

Arthur Pesah, Antoine Wehenkel, Gilles Louppe

Nov 30, 2018

Arthur Pesah, Antoine Wehenkel, Gilles Louppe

* NeurIPS 2018 Workshop on Meta-learning (MetaLearn 2018)

**Click to Read Paper**

Adversarial Variational Optimization of Non-Differentiable Simulators

Oct 05, 2018

Gilles Louppe, Joeri Hermans, Kyle Cranmer

Oct 05, 2018

Gilles Louppe, Joeri Hermans, Kyle Cranmer

**Click to Read Paper**

* v1: Original submission. v2: Fixed references. v3: version submitted to NIPS'2017. Code available at https://github.com/glouppe/paper-learning-to-pivot

**Click to Read Paper**

Approximating Likelihood Ratios with Calibrated Discriminative Classifiers

Mar 18, 2016

Kyle Cranmer, Juan Pavez, Gilles Louppe

In many fields of science, generalized likelihood ratio tests are established tools for statistical inference. At the same time, it has become increasingly common that a simulator (or generative model) is used to describe complex processes that tie parameters $\theta$ of an underlying theory and measurement apparatus to high-dimensional observations $\mathbf{x}\in \mathbb{R}^p$. However, simulator often do not provide a way to evaluate the likelihood function for a given observation $\mathbf{x}$, which motivates a new class of likelihood-free inference algorithms. In this paper, we show that likelihood ratios are invariant under a specific class of dimensionality reduction maps $\mathbb{R}^p \mapsto \mathbb{R}$. As a direct consequence, we show that discriminative classifiers can be used to approximate the generalized likelihood ratio statistic when only a generative model for the data is available. This leads to a new machine learning-based approach to likelihood-free inference that is complementary to Approximate Bayesian Computation, and which does not require a prior on the model parameters. Experimental results on artificial problems with known exact likelihoods illustrate the potential of the proposed method.
Mar 18, 2016

Kyle Cranmer, Juan Pavez, Gilles Louppe

* 35 pages, 5 figures

**Click to Read Paper**

Deep Quality-Value (DQV) Learning

Oct 10, 2018

Matthia Sabatelli, Gilles Louppe, Pierre Geurts, Marco A. Wiering

Oct 10, 2018

Matthia Sabatelli, Gilles Louppe, Pierre Geurts, Marco A. Wiering

**Click to Read Paper**

Mining gold from implicit models to improve likelihood-free inference

Oct 09, 2018

Johann Brehmer, Gilles Louppe, Juan Pavez, Kyle Cranmer

Oct 09, 2018

Johann Brehmer, Gilles Louppe, Juan Pavez, Kyle Cranmer

* Code available at https://github.com/johannbrehmer/simulator-mining-example . v2: Fixed typos. v3: Expanded discussion, added Lotka-Volterra example

**Click to Read Paper**

A Guide to Constraining Effective Field Theories with Machine Learning

Jul 26, 2018

Johann Brehmer, Kyle Cranmer, Gilles Louppe, Juan Pavez

We develop, discuss, and compare several inference techniques to constrain theory parameters in collider experiments. By harnessing the latent-space structure of particle physics processes, we extract extra information from the simulator. This augmented data can be used to train neural networks that precisely estimate the likelihood ratio. The new methods scale well to many observables and high-dimensional parameter spaces, do not require any approximations of the parton shower and detector response, and can be evaluated in microseconds. Using weak-boson-fusion Higgs production as an example process, we compare the performance of several techniques. The best results are found for likelihood ratio estimators trained with extra information about the score, the gradient of the log likelihood function with respect to the theory parameters. The score also provides sufficient statistics that contain all the information needed for inference in the neighborhood of the Standard Model. These methods enable us to put significantly stronger bounds on effective dimension-six operators than the traditional approach based on histograms. They also outperform generic machine learning methods that do not make use of the particle physics structure, demonstrating their potential to substantially improve the new physics reach of the LHC legacy results.
Jul 26, 2018

Johann Brehmer, Kyle Cranmer, Gilles Louppe, Juan Pavez

* Phys. Rev. D 98, 052004 (2018)

* See also the companion publication "Constraining Effective Field Theories with Machine Learning" at arXiv:1805.00013, a brief introduction presenting the key ideas. The code for these studies is available at https://github.com/johannbrehmer/higgs_inference . v2: Added references. v3: Improved description of algorithms, added references. v4: Clarified text, added references

**Click to Read Paper**

Constraining Effective Field Theories with Machine Learning

Jul 26, 2018

Johann Brehmer, Kyle Cranmer, Gilles Louppe, Juan Pavez

We present powerful new analysis techniques to constrain effective field theories at the LHC. By leveraging the structure of particle physics processes, we extract extra information from Monte-Carlo simulations, which can be used to train neural network models that estimate the likelihood ratio. These methods scale well to processes with many observables and theory parameters, do not require any approximations of the parton shower or detector response, and can be evaluated in microseconds. We show that they allow us to put significantly stronger bounds on dimension-six operators than existing methods, demonstrating their potential to improve the precision of the LHC legacy constraints.
Jul 26, 2018

Johann Brehmer, Kyle Cranmer, Gilles Louppe, Juan Pavez

* Phys. Rev. Lett. 121, 111801 (2018)

* See also the companion publication "A Guide to Constraining Effective Field Theories with Machine Learning" at arXiv:1805.00020, an in-depth analysis of machine learning techniques for LHC measurements. The code for these studies is available at https://github.com/johannbrehmer/higgs_inference . v2: New schematic figure explaining the new algorithms, added references. v3, v4: Added references

**Click to Read Paper**

QCD-Aware Recursive Neural Networks for Jet Physics

Jul 13, 2018

Gilles Louppe, Kyunghyun Cho, Cyril Becot, Kyle Cranmer

Recent progress in applying machine learning for jet physics has been built upon an analogy between calorimeters and images. In this work, we present a novel class of recursive neural networks built instead upon an analogy between QCD and natural languages. In the analogy, four-momenta are like words and the clustering history of sequential recombination jet algorithms is like the parsing of a sentence. Our approach works directly with the four-momenta of a variable-length set of particles, and the jet-based tree structure varies on an event-by-event basis. Our experiments highlight the flexibility of our method for building task-specific jet embeddings and show that recursive architectures are significantly more accurate and data efficient than previous image-based networks. We extend the analogy from individual jets (sentences) to full events (paragraphs), and show for the first time an event-level classifier operating on all the stable particles produced in an LHC event.
Jul 13, 2018

Gilles Louppe, Kyunghyun Cho, Cyril Becot, Kyle Cranmer

* 16 pages, 5 figures, 3 appendices, corresponding code at https://github.com/glouppe/recnn

**Click to Read Paper**

Ethnicity sensitive author disambiguation using semi-supervised learning

May 04, 2016

Gilles Louppe, Hussein Al-Natsheh, Mateusz Susik, Eamonn Maguire

May 04, 2016

Gilles Louppe, Hussein Al-Natsheh, Mateusz Susik, Eamonn Maguire

**Click to Read Paper**

Likelihood-free inference with an improved cross-entropy estimator

Aug 02, 2018

Markus Stoye, Johann Brehmer, Gilles Louppe, Juan Pavez, Kyle Cranmer

Aug 02, 2018

Markus Stoye, Johann Brehmer, Gilles Louppe, Juan Pavez, Kyle Cranmer

* 8 pages, 3 figures

**Click to Read Paper**

Random Subspace with Trees for Feature Selection Under Memory Constraints

Sep 06, 2017

Antonio Sutera, Célia Châtel, Gilles Louppe, Louis Wehenkel, Pierre Geurts

Sep 06, 2017

Antonio Sutera, Célia Châtel, Gilles Louppe, Louis Wehenkel, Pierre Geurts

**Click to Read Paper**

Context-dependent feature analysis with random forests

May 12, 2016

Antonio Sutera, Gilles Louppe, Vân Anh Huynh-Thu, Louis Wehenkel, Pierre Geurts

May 12, 2016

Antonio Sutera, Gilles Louppe, Vân Anh Huynh-Thu, Louis Wehenkel, Pierre Geurts

* Accepted for presentation at UAI 2016

**Click to Read Paper**

Simple connectome inference from partial correlation statistics in calcium imaging

Nov 18, 2014

Antonio Sutera, Arnaud Joly, Vincent François-Lavet, Zixiao Aaron Qiu, Gilles Louppe, Damien Ernst, Pierre Geurts

Nov 18, 2014

Antonio Sutera, Arnaud Joly, Vincent François-Lavet, Zixiao Aaron Qiu, Gilles Louppe, Damien Ernst, Pierre Geurts

**Click to Read Paper**

Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model

Sep 01, 2018

Atilim Gunes Baydin, Lukas Heinrich, Wahid Bhimji, Bradley Gram-Hansen, Gilles Louppe, Lei Shao, Prabhat, Kyle Cranmer, Frank Wood

Sep 01, 2018

Atilim Gunes Baydin, Lukas Heinrich, Wahid Bhimji, Bradley Gram-Hansen, Gilles Louppe, Lei Shao, Prabhat, Kyle Cranmer, Frank Wood

* 18 pages, 5 figures

**Click to Read Paper**

Improvements to Inference Compilation for Probabilistic Programming in Large-Scale Scientific Simulators

Dec 21, 2017

Mario Lezcano Casado, Atilim Gunes Baydin, David Martinez Rubio, Tuan Anh Le, Frank Wood, Lukas Heinrich, Gilles Louppe, Kyle Cranmer, Karen Ng, Wahid Bhimji, Prabhat

Dec 21, 2017

Mario Lezcano Casado, Atilim Gunes Baydin, David Martinez Rubio, Tuan Anh Le, Frank Wood, Lukas Heinrich, Gilles Louppe, Kyle Cranmer, Karen Ng, Wahid Bhimji, Prabhat

* 7 pages, 2 figures

**Click to Read Paper**

API design for machine learning software: experiences from the scikit-learn project

Sep 01, 2013

Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake Vanderplas, Arnaud Joly, Brian Holt, Gaël Varoquaux

Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library.
Sep 01, 2013

Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake Vanderplas, Arnaud Joly, Brian Holt, Gaël Varoquaux

* European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Databases (2013)

**Click to Read Paper**

Scikit-learn: Machine Learning in Python

Jun 05, 2018

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay

Jun 05, 2018

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay

* Journal of Machine Learning Research (2011)

* Update authors list and URLs

**Click to Read Paper**