* 15 pages - fix broken pagination in v2

**Click to Read Paper**

Controlling Covariate Shift using Equilibrium Normalization of Weights

Dec 11, 2018

Aaron Defazio, Léon Bottou

Dec 11, 2018

Aaron Defazio, Léon Bottou

**Click to Read Paper**

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Dec 11, 2018

Aaron Defazio, Léon Bottou

Dec 11, 2018

Aaron Defazio, Léon Bottou

**Click to Read Paper**

Towards Principled Methods for Training Generative Adversarial Networks

Jan 17, 2017

Martin Arjovsky, Léon Bottou

Jan 17, 2017

Martin Arjovsky, Léon Bottou

**Click to Read Paper**

This paper presents a lower bound for optimizing a finite sum of $n$ functions, where each function is $L$-smooth and the sum is $\mu$-strongly convex. We show that no algorithm can reach an error $\epsilon$ in minimizing all functions from this class in fewer than $\Omega(n + \sqrt{n(\kappa-1)}\log(1/\epsilon))$ iterations, where $\kappa=L/\mu$ is a surrogate condition number. We then compare this lower bound to upper bounds for recently developed methods specializing to this setting. When the functions involved in this sum are not arbitrary, but based on i.i.d. random data, then we further contrast these complexity results with those for optimal first-order methods to directly optimize the sum. The conclusion we draw is that a lot of caution is necessary for an accurate comparison, and identify machine learning scenarios where the new methods help computationally.

* Added an erratum, we are currently working on extending the result to randomized algorithms

* Added an erratum, we are currently working on extending the result to randomized algorithms

**Click to Read Paper**
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization

Jun 21, 2018

Rachel Ward, Xiaoxia Wu, Leon Bottou

Jun 21, 2018

Rachel Ward, Xiaoxia Wu, Leon Bottou

* 17 pages, 3 figures

**Click to Read Paper**

WNGrad: Learn the Learning Rate in Gradient Descent

Mar 07, 2018

Xiaoxia Wu, Rachel Ward, Léon Bottou

Mar 07, 2018

Xiaoxia Wu, Rachel Ward, Léon Bottou

* 10 pages, 3 figures, conference

**Click to Read Paper**

**Click to Read Paper**

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

Oct 05, 2017

Levent Sagun, Leon Bottou, Yann LeCun

Oct 05, 2017

Levent Sagun, Leon Bottou, Yann LeCun

* ICLR submission, 2016 - updated to match the openreview.net version

**Click to Read Paper**

**Click to Read Paper**

Optimization Methods for Large-Scale Machine Learning

Feb 08, 2018

Léon Bottou, Frank E. Curtis, Jorge Nocedal

Feb 08, 2018

Léon Bottou, Frank E. Curtis, Jorge Nocedal

**Click to Read Paper**

Algorithms for hyperparameter optimization abound, all of which work well under different and often unverifiable assumptions. Motivated by the general challenge of sequentially choosing which algorithm to use, we study the more specific task of choosing among distributions to use for random hyperparameter optimization. This work is naturally framed in the extreme bandit setting, which deals with sequentially choosing which distribution from a collection to sample in order to minimize (maximize) the single best cost (reward). Whereas the distributions in the standard bandit setting are primarily characterized by their means, a number of subtleties arise when we care about the minimal cost as opposed to the average cost. For example, there may not be a well-defined "best" distribution as there is in the standard bandit setting. The best distribution depends on the rewards that have been obtained and on the remaining time horizon. Whereas in the standard bandit setting, it is sensible to compare policies with an oracle which plays the single best arm, in the extreme bandit setting, there are multiple sensible oracle models. We define a sensible notion of "extreme regret" in the extreme bandit setting, which parallels the concept of regret in the standard bandit setting. We then prove that no policy can asymptotically achieve no extreme regret.

* 11 pages, International Conference on Artificial Intelligence and Statistics, 2016

* 11 pages, International Conference on Artificial Intelligence and Statistics, 2016

**Click to Read Paper**
A Parallel SGD method with Strong Convergence

Nov 04, 2013

Dhruv Mahajan, S. Sathiya Keerthi, S. Sundararajan, Leon Bottou

Nov 04, 2013

Dhruv Mahajan, S. Sathiya Keerthi, S. Sundararajan, Leon Bottou

**Click to Read Paper**

**Click to Read Paper**

Geometrical Insights for Implicit Generative Modeling

Mar 12, 2018

Leon Bottou, Martin Arjovsky, David Lopez-Paz, Maxime Oquab

Mar 12, 2018

Leon Bottou, Martin Arjovsky, David Lopez-Paz, Maxime Oquab

**Click to Read Paper**

Unifying distillation and privileged information

Feb 26, 2016

David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik

Feb 26, 2016

David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik

* Proceedings of the International Conference on Learning Representations (2016) 1-10

**Click to Read Paper**

SING: Symbol-to-Instrument Neural Generator

Oct 23, 2018

Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach

Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their successes, current state-of-the-art neural audio synthesizers such as WaveNet and SampleRNN suffer from prohibitive training and inference times because they are based on autoregressive models that generate audio samples one at a time at a rate of 16kHz. In this work, we study the more computationally efficient alternative of generating the waveform frame-by-frame with large strides. We present SING, a lightweight neural audio synthesizer for the original task of generating musical notes given desired instrument, pitch and velocity. Our model is trained end-to-end to generate notes from nearly 1000 instruments with a single decoder, thanks to a new loss function that minimizes the distances between the log spectrograms of the generated and target waveforms. On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.
Oct 23, 2018

Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach

* Conference on Neural Information Processing Systems (NIPS), Dec 2018, Montr{\'e}al, Canada

**Click to Read Paper**

An efficient distributed learning algorithm based on effective local functional approximations

Mar 16, 2015

Dhruv Mahajan, Nikunj Agrawal, S. Sathiya Keerthi, S. Sundararajan, Leon Bottou

Mar 16, 2015

Dhruv Mahajan, Nikunj Agrawal, S. Sathiya Keerthi, S. Sundararajan, Leon Bottou

**Click to Read Paper**

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

May 07, 2018

Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou

May 07, 2018

Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou

* Minor update for ICLR 2018 Workshop Track presentation

**Click to Read Paper**

Discovering Causal Signals in Images

Oct 31, 2017

David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, Léon Bottou

Oct 31, 2017

David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, Léon Bottou

**Click to Read Paper**