Gradient-based bilevel optimization for multi-penalty Ridge regression
through matrix differential calculus
- URL: http://arxiv.org/abs/2311.14182v1
- Date: Thu, 23 Nov 2023 20:03:51 GMT
- Title: Gradient-based bilevel optimization for multi-penalty Ridge regression
through matrix differential calculus
- Authors: Gabriele Maroni, Loris Cannelli, Dario Piga
- Abstract summary: We introduce a gradient-based approach to the problem of linear regression with l2-regularization.
We show that our approach outperforms LASSO, Ridge, and Elastic Net regression.
The analytical of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation.
- Score: 0.46040036610482665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Common regularization algorithms for linear regression, such as LASSO and
Ridge regression, rely on a regularization hyperparameter that balances the
tradeoff between minimizing the fitting error and the norm of the learned model
coefficients. As this hyperparameter is scalar, it can be easily selected via
random or grid search optimizing a cross-validation criterion. However, using a
scalar hyperparameter limits the algorithm's flexibility and potential for
better generalization. In this paper, we address the problem of linear
regression with l2-regularization, where a different regularization
hyperparameter is associated with each input variable. We optimize these
hyperparameters using a gradient-based approach, wherein the gradient of a
cross-validation criterion with respect to the regularization hyperparameters
is computed analytically through matrix differential calculus. Additionally, we
introduce two strategies tailored for sparse model learning problems aiming at
reducing the risk of overfitting to the validation data. Numerical examples
demonstrate that our multi-hyperparameter regularization approach outperforms
LASSO, Ridge, and Elastic Net regression. Moreover, the analytical computation
of the gradient proves to be more efficient in terms of computational time
compared to automatic differentiation, especially when handling a large number
of input variables. Application to the identification of over-parameterized
Linear Parameter-Varying models is also presented.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Stability-Adjusted Cross-Validation for Sparse Linear Regression [5.156484100374059]
Cross-validation techniques like k-fold cross-validation substantially increase the computational cost of sparse regression.
We propose selecting hyper parameters that minimize a weighted sum of a cross-validation metric and a model's output stability.
Our confidence adjustment procedure reduces test set error by 2%, on average, on 13 real-world datasets.
arXiv Detail & Related papers (2023-06-26T17:02:45Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - An adaptive shortest-solution guided decimation approach to sparse
high-dimensional linear regression [2.3759847811293766]
ASSD is adapted from the shortest solution-guided algorithm and is referred to as ASSD.
ASSD is especially suitable for linear regression problems with highly correlated measurement matrices encountered in real-world applications.
arXiv Detail & Related papers (2022-11-28T04:29:57Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Sliced gradient-enhanced Kriging for high-dimensional function
approximation [2.8228516010000617]
Gradient-enhanced Kriging (GE-Kriging) is a well-established surrogate modelling technique for approximating expensive computational models.
It tends to get impractical for high-dimensional problems due to the size of the inherent correlation matrix.
A new method, called sliced GE-Kriging (SGE-Kriging), is developed in this paper for reducing the size of the correlation matrix.
The results show that the SGE-Kriging model features an accuracy and robustness that is comparable to the standard one but comes at much less training costs.
arXiv Detail & Related papers (2022-04-05T07:27:14Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications [5.660384137948734]
The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
arXiv Detail & Related papers (2020-06-29T20:57:20Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.