Celer: a Fast Solver for the Lasso with Dual Extrapolation

Jun 06, 2018

Mathurin Massias, Alexandre Gramfort, Joseph Salmon

Convex sparsity-inducing regularizations are ubiquitous in high-dimensional machine learning, but solving the resulting optimization problems can be slow. To accelerate solvers, state-of-the-art approaches consist in reducing the size of the optimization problem at hand. In the context of regression, this can be achieved either by discarding irrelevant features (screening techniques) or by prioritizing features likely to be included in the support of the solution (working set techniques). Duality comes into play at several steps in these techniques. Here, we propose an extrapolation technique starting from a sequence of iterates in the dual that leads to the construction of improved dual points. This enables a tighter control of optimality as used in stopping criterion, as well as better screening performance of Gap Safe rules. Finally, we propose a working set strategy based on an aggressive use of Gap Safe screening rules. Thanks to our new dual point construction, we show significant computational speedups on multiple real-world problems.
Jun 06, 2018

Mathurin Massias, Alexandre Gramfort, Joseph Salmon

**Click to Read Paper**

From safe screening rules to working sets for faster Lasso-type solvers

May 01, 2017

Mathurin Massias, Alexandre Gramfort, Joseph Salmon

May 01, 2017

Mathurin Massias, Alexandre Gramfort, Joseph Salmon

**Click to Read Paper**

Mind the duality gap: safer rules for the Lasso

Dec 03, 2015

Olivier Fercoq, Alexandre Gramfort, Joseph Salmon

Screening rules allow to early discard irrelevant variables from the optimization in Lasso problems, or its derivatives, making solvers faster. In this paper, we propose new versions of the so-called $\textit{safe rules}$ for the Lasso. Based on duality gap considerations, our new rules create safe test regions whose diameters converge to zero, provided that one relies on a converging solver. This property helps screening out more variables, for a wider range of regularization parameter values. In addition to faster convergence, we prove that we correctly identify the active sets (supports) of the solutions in finite time. While our proposed strategy can cope with any solver, its performance is demonstrated using a coordinate descent algorithm particularly adapted to machine learning use cases. Significant computing time reductions are obtained with respect to previous safe rules.
Dec 03, 2015

Olivier Fercoq, Alexandre Gramfort, Joseph Salmon

**Click to Read Paper**

A two-stage denoising filter: the preprocessed Yaroslavsky filter

Aug 31, 2012

Joseph Salmon, Rebecca Willett, Ery Arias-Castro

Aug 31, 2012

Joseph Salmon, Rebecca Willett, Ery Arias-Castro

**Click to Read Paper**

Oracle inequalities and minimax rates for non-local means and related adaptive kernel-based methods

Apr 26, 2012

Ery Arias-Castro, Joseph Salmon, Rebecca Willett

This paper describes a novel theoretical characterization of the performance of non-local means (NLM) for noise removal. NLM has proven effective in a variety of empirical studies, but little is understood fundamentally about how it performs relative to classical methods based on wavelets or how various parameters (e.g., patch size) should be chosen. For cartoon images and images which may contain thin features and regular textures, the error decay rates of NLM are derived and compared with those of linear filtering, oracle estimators, variable-bandwidth kernel methods, Yaroslavsky's filter and wavelet thresholding estimators. The trade-off between global and local search for matching patches is examined, and the bias reduction associated with the local polynomial regression version of NLM is analyzed. The theoretical results are validated via simulations for 2D images corrupted by additive white Gaussian noise.
Apr 26, 2012

Ery Arias-Castro, Joseph Salmon, Rebecca Willett

**Click to Read Paper**

Gap Safe screening rules for sparsity enforcing penalties

Dec 27, 2017

Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon

Dec 27, 2017

Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon

**Click to Read Paper**

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Oct 18, 2017

Mathurin Massias, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon

In high dimension, it is customary to consider Lasso-type estimators to enforce sparsity. For standard Lasso theory to hold, the regularization parameter should be proportional to the noise level, yet the latter is generally unknown in practice. A possible remedy is to consider estimators, such as the Concomitant/Scaled Lasso, which jointly optimize over the regression coefficients as well as over the noise level, making the choice of the regularization independent of the noise level. However, when data from different sources are pooled to increase sample size, or when dealing with multimodal datasets, noise levels typically differ and new dedicated estimators are needed. In this work we provide new statistical and computational solutions to deal with such heteroscedastic regression models, with an emphasis on functional brain imaging with combined magneto- and electroencephalographic (M/EEG) signals. Adopting the formulation of Concomitant Lasso-type estimators, we propose a jointly convex formulation to estimate both the regression coefficients and the (square root of the) noise covariance. When our framework is instantiated to de-correlated noise, it leads to an efficient algorithm whose computational cost is not higher than for the Lasso and Concomitant Lasso, while addressing more complex noise structures. Numerical experiments demonstrate that our estimator yields improved prediction and support identification while correctly estimating the noise (square root) covariance. Results on multimodal neuroimaging problems with M/EEG data are also reported.
Oct 18, 2017

Mathurin Massias, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon

**Click to Read Paper**

On the benefits of output sparsity for multi-label classification

Mar 14, 2017

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Joseph Salmon

Mar 14, 2017

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Joseph Salmon

**Click to Read Paper**

Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions

Jun 08, 2016

Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon

Jun 08, 2016

Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon

**Click to Read Paper**

GAP Safe Screening Rules for Sparse-Group-Lasso

Feb 19, 2016

Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon

Feb 19, 2016

Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon

**Click to Read Paper**

GAP Safe screening rules for sparse multi-task and multi-class models

Nov 18, 2015

Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon

Nov 18, 2015

Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon

**Click to Read Paper**

Extending Gossip Algorithms to Distributed Estimation of U-Statistics

Nov 17, 2015

Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon

Nov 17, 2015

Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon

**Click to Read Paper**

Adaptive Multinomial Matrix Completion

Aug 26, 2014

Olga Klopp, Jean Lafond, Eric Moulines, Joseph Salmon

The task of estimating a matrix given a sample of observed entries is known as the \emph{matrix completion problem}. Most works on matrix completion have focused on recovering an unknown real-valued low-rank matrix from a random sample of its entries. Here, we investigate the case of highly quantized observations when the measurements can take only a small number of values. These quantized outputs are generated according to a probability distribution parametrized by the unknown matrix of interest. This model corresponds, for example, to ratings in recommender systems or labels in multi-class classification. We consider a general, non-uniform, sampling scheme and give theoretical guarantees on the performance of a constrained, nuclear norm penalized maximum likelihood estimator. One important advantage of this estimator is that it does not require knowledge of the rank or an upper bound on the nuclear norm of the unknown matrix and, thus, it is adaptive. We provide lower bounds showing that our estimator is minimax optimal. An efficient algorithm based on lifted coordinate gradient descent is proposed to compute the estimator. A limited Monte-Carlo experiment, using both simulated and real data is provided to support our claims.
Aug 26, 2014

Olga Klopp, Jean Lafond, Eric Moulines, Joseph Salmon

**Click to Read Paper**

Characterizing the maximum parameter of the total-variation denoising through the pseudo-inverse of the divergence

Dec 08, 2016

Charles-Alban Deledalle, Nicolas Papadakis, Joseph Salmon, Samuel Vaiter

We focus on the maximum regularization parameter for anisotropic total-variation denoising. It corresponds to the minimum value of the regularization parameter above which the solution remains constant. While this value is well know for the Lasso, such a critical value has not been investigated in details for the total-variation. Though, it is of importance when tuning the regularization parameter as it allows fixing an upper-bound on the grid for which the optimal parameter is sought. We establish a closed form expression for the one-dimensional case, as well as an upper-bound for the two-dimensional case, that appears reasonably tight in practice. This problem is directly linked to the computation of the pseudo-inverse of the divergence, which can be quickly obtained by performing convolutions in the Fourier domain.
Dec 08, 2016

Charles-Alban Deledalle, Nicolas Papadakis, Joseph Salmon, Samuel Vaiter

**Click to Read Paper**

Poisson noise reduction with non-local PCA

Apr 28, 2014

Joseph Salmon, Zachary Harmany, Charles-Alban Deledalle, Rebecca Willett

Apr 28, 2014

Joseph Salmon, Zachary Harmany, Charles-Alban Deledalle, Rebecca Willett

**Click to Read Paper**

Learning Heteroscedastic Models by Convex Programming under Group Sparsity

Apr 16, 2013

Arnak S. Dalalyan, Mohamed Hebiri, Katia Méziani, Joseph Salmon

Popular sparse estimation methods based on $\ell_1$-relaxation, such as the Lasso and the Dantzig selector, require the knowledge of the variance of the noise in order to properly tune the regularization parameter. This constitutes a major obstacle in applying these methods in several frameworks---such as time series, random fields, inverse problems---for which the noise is rarely homoscedastic and its level is hard to know in advance. In this paper, we propose a new approach to the joint estimation of the conditional mean and the conditional variance in a high-dimensional (auto-) regression setting. An attractive feature of the proposed estimator is that it is efficiently computable even for very large scale problems by solving a second-order cone program (SOCP). We present theoretical analysis and numerical results assessing the performance of the proposed procedure.
Apr 16, 2013

Arnak S. Dalalyan, Mohamed Hebiri, Katia Méziani, Joseph Salmon

**Click to Read Paper**

Safe Grid Search with Optimal Complexity

Oct 12, 2018

Eugene Ndiaye, Tam Le, Olivier Fercoq, Joseph Salmon, Ichiro Takeuchi

Oct 12, 2018

Eugene Ndiaye, Tam Le, Olivier Fercoq, Joseph Salmon, Ichiro Takeuchi

**Click to Read Paper**

Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression

Jun 08, 2016

Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Vincent Leclère, Joseph Salmon

Jun 08, 2016

Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Vincent Leclère, Joseph Salmon

**Click to Read Paper**