Implicit differentiation of Lasso-type models for hyperparameter
optimization
- URL: http://arxiv.org/abs/2002.08943v3
- Date: Thu, 3 Sep 2020 16:53:44 GMT
- Title: Implicit differentiation of Lasso-type models for hyperparameter
optimization
- Authors: Quentin Bertrand and Quentin Klopfenstein and Mathieu Blondel and
Samuel Vaiter and Alexandre Gramfort and Joseph Salmon
- Abstract summary: We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
- Score: 82.73138686390514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Setting regularization parameters for Lasso-type estimators is notoriously
difficult, though crucial in practice. The most popular hyperparameter
optimization approach is grid-search using held-out validation data.
Grid-search however requires to choose a predefined grid for each parameter,
which scales exponentially in the number of parameters. Another approach is to
cast hyperparameter optimization as a bi-level optimization problem, one can
solve by gradient descent. The key challenge for these methods is the
estimation of the gradient with respect to the hyperparameters. Computing this
gradient via forward or backward automatic differentiation is possible yet
usually suffers from high memory consumption. Alternatively implicit
differentiation typically involves solving a linear system which can be
prohibitive and numerically unstable in high dimension. In addition, implicit
differentiation usually assumes smooth loss functions, which is not the case
for Lasso-type problems. This work introduces an efficient implicit
differentiation algorithm, without matrix inversion, tailored for Lasso-type
problems. Our approach scales to high-dimensional data by leveraging the
sparsity of the solutions. Experiments demonstrate that the proposed method
outperforms a large number of standard methods to optimize the error on
held-out data, or the Stein Unbiased Risk Estimator (SURE).
Related papers
- Gradient-based bilevel optimization for multi-penalty Ridge regression
through matrix differential calculus [0.46040036610482665]
We introduce a gradient-based approach to the problem of linear regression with l2-regularization.
We show that our approach outperforms LASSO, Ridge, and Elastic Net regression.
The analytical of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation.
arXiv Detail & Related papers (2023-11-23T20:03:51Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication.
We prove that preconditioning has an additional benefit that has been previously unexplored.
It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Divide and Learn: A Divide and Conquer Approach for Predict+Optimize [50.03608569227359]
The predict+optimize problem combines machine learning ofproblem coefficients with a optimization prob-lem that uses the predicted coefficients.
We show how to directlyexpress the loss of the optimization problem in terms of thepredicted coefficients as a piece-wise linear function.
We propose a novel divide and algorithm to tackle optimization problems without this restriction and predict itscoefficients using the optimization loss.
arXiv Detail & Related papers (2020-12-04T00:26:56Z) - Convergence Properties of Stochastic Hypergradients [38.64355126221992]
We study approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk on a large dataset.
We provide numerical experiments to support our theoretical analysis and to show the advantage of using hypergradients in practice.
arXiv Detail & Related papers (2020-11-13T20:50:36Z) - Efficient hyperparameter optimization by way of PAC-Bayes bound
minimization [4.191847852775072]
We present an alternative objective that is equivalent to a Probably Approximately Correct-Bayes (PAC-Bayes) bound on the expected out-of-sample error.
We then devise an efficient gradient-based algorithm to minimize this objective.
arXiv Detail & Related papers (2020-08-14T15:54:51Z) - Optimizing generalization on the train set: a novel gradient-based
framework to train parameters and hyperparameters simultaneously [0.0]
Generalization is a central problem in Machine Learning.
We present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization.
arXiv Detail & Related papers (2020-06-11T18:04:36Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.