Provably tuning the ElasticNet across instances
- URL: http://arxiv.org/abs/2207.10199v2
- Date: Mon, 15 Jan 2024 09:23:39 GMT
- Title: Provably tuning the ElasticNet across instances
- Authors: Maria-Florina Balcan, Mikhail Khodak, Dravyansh Sharma, Ameet
Talwalkar
- Abstract summary: We consider the problem of tuning the regularization parameters of Ridge regression, LASSO, and the ElasticNet across multiple problem instances.
Our results are the first general learning-theoretic guarantees for this important class of problems.
- Score: 53.0518090093538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An important unresolved challenge in the theory of regularization is to set
the regularization coefficients of popular techniques like the ElasticNet with
general provable guarantees. We consider the problem of tuning the
regularization parameters of Ridge regression, LASSO, and the ElasticNet across
multiple problem instances, a setting that encompasses both cross-validation
and multi-task hyperparameter optimization. We obtain a novel structural result
for the ElasticNet which characterizes the loss as a function of the tuning
parameters as a piecewise-rational function with algebraic boundaries. We use
this to bound the structural complexity of the regularized loss functions and
show generalization guarantees for tuning the ElasticNet regression
coefficients in the statistical setting. We also consider the more challenging
online learning setting, where we show vanishing average expected regret
relative to the optimal parameter pair. We further extend our results to tuning
classification algorithms obtained by thresholding regression fits regularized
by Ridge, LASSO, or ElasticNet. Our results are the first general
learning-theoretic guarantees for this important class of problems that avoid
strong assumptions on the data distribution. Furthermore, our guarantees hold
for both validation and popular information criterion objectives.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Gradient-based bilevel optimization for multi-penalty Ridge regression
through matrix differential calculus [0.46040036610482665]
We introduce a gradient-based approach to the problem of linear regression with l2-regularization.
We show that our approach outperforms LASSO, Ridge, and Elastic Net regression.
The analytical of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation.
arXiv Detail & Related papers (2023-11-23T20:03:51Z) - Achieving Constraints in Neural Networks: A Stochastic Augmented
Lagrangian Approach [49.1574468325115]
Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting.
We propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem.
We employ the Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism.
arXiv Detail & Related papers (2023-10-25T13:55:35Z) - Regularization, early-stopping and dreaming: a Hopfield-like setup to
address generalization and overfitting [0.0]
We look for optimal network parameters by applying a gradient descent over a regularized loss function.
Within this framework, the optimal neuron-interaction matrices correspond to Hebbian kernels revised by a reiterated unlearning protocol.
arXiv Detail & Related papers (2023-08-01T15:04:30Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - The Benefits of Implicit Regularization from SGD in Least Squares
Problems [116.85246178212616]
gradient descent (SGD) exhibits strong algorithmic regularization effects in practice.
We make comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression.
arXiv Detail & Related papers (2021-08-10T09:56:47Z) - Support estimation in high-dimensional heteroscedastic mean regression [2.28438857884398]
We consider a linear mean regression model with random design and potentially heteroscedastic, heavy-tailed errors.
We use a strictly convex, smooth variant of the Huber loss function with tuning parameter depending on the parameters of the problem.
For the resulting estimator we show sign-consistency and optimal rates of convergence in the $ell_infty$ norm.
arXiv Detail & Related papers (2020-11-03T09:46:31Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - A generalized linear joint trained framework for semi-supervised
learning of sparse features [4.511923587827301]
The elastic-net is among the most widely used types of regularization algorithms.
This paper introduces a novel solution for semi-supervised learning of sparse features in the context of generalized linear model estimation.
arXiv Detail & Related papers (2020-06-02T14:44:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.