Optimistic Rates: A Unifying Theory for Interpolation Learning and
Regularization in Linear Regression
- URL: http://arxiv.org/abs/2112.04470v1
- Date: Wed, 8 Dec 2021 18:55:00 GMT
- Title: Optimistic Rates: A Unifying Theory for Interpolation Learning and
Regularization in Linear Regression
- Authors: Lijia Zhou and Frederic Koehler and Danica J. Sutherland and Nathan
Srebro
- Abstract summary: We study a localized notion of uniform convergence known as an "optimistic rate"
Our refined analysis avoids the hidden constant and logarithmic factor in existing results.
- Score: 35.78863301525758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a localized notion of uniform convergence known as an "optimistic
rate" (Panchenko 2002; Srebro et al. 2010) for linear regression with Gaussian
data. Our refined analysis avoids the hidden constant and logarithmic factor in
existing results, which are known to be crucial in high-dimensional settings,
especially for understanding interpolation learning. As a special case, our
analysis recovers the guarantee from Koehler et al. (2021), which tightly
characterizes the population risk of low-norm interpolators under the benign
overfitting conditions. Our optimistic rate bound, though, also analyzes
predictors with arbitrary training error. This allows us to recover some
classical statistical guarantees for ridge and LASSO regression under random
designs, and helps us obtain a precise understanding of the excess risk of
near-interpolators in the over-parameterized regime.
Related papers
- On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds [11.30047438005394]
This work investigates the question of how to choose the regularization norm $lVert cdot rVert$ in the context of high-dimensional adversarial training for binary classification.
We quantitatively characterize the relationship between perturbation size and the optimal choice of $lVert cdot rVert$, confirming the intuition that, in the data scarce regime, the type of regularization becomes increasingly important for adversarial training as perturbations grow in size.
arXiv Detail & Related papers (2024-10-21T14:53:12Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Precise analysis of ridge interpolators under heavy correlations -- a Random Duality Theory view [0.0]
We show that emphRandom Duality Theory (RDT) can be utilized to obtain precise closed form characterizations of all estimators related optimizing quantities of interest.
arXiv Detail & Related papers (2024-06-13T14:56:52Z) - A Statistical Theory of Regularization-Based Continual Learning [10.899175512941053]
We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks.
We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously.
A byproduct of our theoretical analysis is the equivalence between early stopping and generalized $ell$-regularization.
arXiv Detail & Related papers (2024-06-10T12:25:13Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Differentially Private Regression with Unbounded Covariates [19.646866014320608]
We provide differentially private algorithms for the classical regression settings of Least Squares Fitting, Binary Regression and Linear Regression.
We capture the fundamental and widely-studied models of logistic regression and linearly-separable SVMs, learning an unbiased estimate of the true regression vector.
arXiv Detail & Related papers (2022-02-19T17:31:38Z) - Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise.
We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.