Related papers: Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks

Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks

URL: http://arxiv.org/abs/2507.05084v1
Date: Mon, 07 Jul 2025 15:08:45 GMT
Title: Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks
Authors: Maria-Florina Balcan, Saumya Goyal, Dravyansh Sharma,
Abstract summary: We find distribution-dependent bounds on the generalization error for the validation loss when tuning the L1 and L2 coefficients.<n>We extend our results to a generalization of ridge regression, where we achieve tighter bounds that take into account the mean of the ground truth distribution.
Score: 24.2043855572415
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern regression problems often involve high-dimensional data and a careful tuning of the regularization hyperparameters is crucial to avoid overly complex models that may overfit the training data while guaranteeing desirable properties like effective variable selection. We study the recently introduced direction of tuning regularization hyperparameters in linear regression across multiple related tasks. We obtain distribution-dependent bounds on the generalization error for the validation loss when tuning the L1 and L2 coefficients, including ridge, lasso and the elastic net. In contrast, prior work develops bounds that apply uniformly to all distributions, but such bounds necessarily degrade with feature dimension, d. While these bounds are shown to be tight for worst-case distributions, our bounds improve with the "niceness" of the data distribution. Concretely, we show that under additional assumptions that instances within each task are i.i.d. draws from broad well-studied classes of distributions including sub-Gaussians, our generalization bounds do not get worse with increasing d, and are much sharper than prior work for very large d. We also extend our results to a generalization of ridge regression, where we achieve tighter bounds that take into account an estimate of the mean of the ground truth distribution.

Related papers

Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models [33.36787620121057]
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space. We use our finite-sample bound to directly recover the "optimistic rate" of Zhou et al. (2021) We show that application of our bound generalization using localized Gaussian width will generally be sharp for empirical risk minimizers.
arXiv Detail & Related papers (2022-10-21T16:16:55Z)
Robustness Implies Generalization via Data-Dependent Generalization Bounds [24.413499775513145]
This paper proves that robustness implies generalization via data-dependent generalization bounds. We present several examples, including ones for lasso and deep learning, in which our bounds are provably preferable.
arXiv Detail & Related papers (2022-06-27T17:58:06Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
The Benefits of Implicit Regularization from SGD in Least Squares Problems [116.85246178212616]
gradient descent (SGD) exhibits strong algorithmic regularization effects in practice. We make comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression.
arXiv Detail & Related papers (2021-08-10T09:56:47Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data to Learn Robust and Invariant Representations [76.85274970052762]
Regularizing distance between embeddings/representations of original samples and augmented counterparts is a popular technique for improving robustness of neural networks. In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings. We show that the generic approach we identified (squared $ell$ regularized augmentation) outperforms several recent methods, which are each specially designed for one task.
arXiv Detail & Related papers (2020-11-25T22:40:09Z)
Learning Invariant Representations and Risks for Semi-supervised Domain Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA) We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z)
The Heavy-Tail Phenomenon in SGD [7.366405857677226]
We show that depending on the structure of the Hessian of the loss at the minimum, the SGD iterates will converge to a emphheavy-tailed stationary distribution. We translate our results into insights about the behavior of SGD in deep learning.
arXiv Detail & Related papers (2020-06-08T16:43:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.