Shrinkage to Infinity: Reducing Test Error by Inflating the Minimum Norm Interpolator in Linear Models
- URL: http://arxiv.org/abs/2510.19206v1
- Date: Wed, 22 Oct 2025 03:30:27 GMT
- Title: Shrinkage to Infinity: Reducing Test Error by Inflating the Minimum Norm Interpolator in Linear Models
- Authors: Jake Freeman,
- Abstract summary: Hastie et al. (2022) found that ridge regularization is essential in high dimensional linear regression $y=betaTx + epsilon$<n>We make precise this observation for linear regression with highly anisotropic covariances and $d/n$.<n>We find that simply scaling up (or inflating) the minimum $ell$ interpolator by a constant greater than one can improve the generalization error.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hastie et al. (2022) found that ridge regularization is essential in high dimensional linear regression $y=\beta^Tx + \epsilon$ with isotropic co-variates $x\in \mathbb{R}^d$ and $n$ samples at fixed $d/n$. However, Hastie et al. (2022) also notes that when the co-variates are anisotropic and $\beta$ is aligned with the top eigenvalues of population covariance, the "situation is qualitatively different." In the present article, we make precise this observation for linear regression with highly anisotropic covariances and diverging $d/n$. We find that simply scaling up (or inflating) the minimum $\ell_2$ norm interpolator by a constant greater than one can improve the generalization error. This is in sharp contrast to traditional regularization/shrinkage prescriptions. Moreover, we use a data-splitting technique to produce consistent estimators that achieve generalization error comparable to that of the optimally inflated minimum-norm interpolator. Our proof relies on apparently novel matching upper and lower bounds for expectations of Gaussian random projections for a general class of anisotropic covariance matrices when $d/n\to \infty$.
Related papers
- Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
We consider the problem of contextual online RLHF with general preferences.<n>We adopt the Generalized Bilinear Preference Model to capture preferences via low-rank, skew-symmetric matrices.<n>We prove that the dual gap of the greedy policy is bounded by the square of the estimation error.
arXiv Detail & Related papers (2026-02-26T15:27:53Z) - Singular Bayesian Neural Networks [1.2891210250935148]
Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors.<n>We induce a posterior that is singular with respect to the Lebesgue measure, concentrating on the rank-$r$ manifold.<n>We derive PAC-Bayes generalization bounds whose complexity term scales as $sqrtr(m+n)$ instead of $sqrtm n$, and prove loss bounds that decompose the error into optimization and rank-induced bias.
arXiv Detail & Related papers (2026-01-30T23:06:34Z) - Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$.
We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z) - Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.<n>We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.<n>Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Near-Interpolators: Rapid Norm Growth and the Trade-Off between
Interpolation and Generalization [28.02367842438021]
We study the generalization capability of nearly-interpolating linear regressors.
For $tau$ fixed, $boldsymbolbeta$ has squared $ell$-norm $bbE[|boldsymbolbeta|_22].
We empirically that a similar phenomenon holds for nearly-interpolating shallow neural networks.
arXiv Detail & Related papers (2024-03-12T02:47:00Z) - Implicit Regularization Leads to Benign Overfitting for Sparse Linear
Regression [16.551664358490658]
In deep learning, often the training process finds an interpolator (a solution with 0 training loss) but the test loss is still low.
One common mechanism for benign overfitting is implicit regularization, where the training process leads to additional properties for the interpolator.
We show that training our new model via gradient descent leads to an interpolator with near-optimal test loss.
arXiv Detail & Related papers (2023-02-01T05:41:41Z) - Dimension free ridge regression [10.434481202633458]
We revisit ridge regression on i.i.d. data in terms of the bias and variance of ridge regression in terms of the bias and variance of an equivalent' sequence model.<n>As a new application, we obtain a completely explicit and sharp characterization of ridge regression for Hilbert covariates with regularly varying spectrum.
arXiv Detail & Related papers (2022-10-16T16:01:05Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and
Benign Overfitting [35.78863301525758]
We prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class.
Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. ( 2020) for minimum-norm interpolators.
We show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings.
arXiv Detail & Related papers (2021-06-17T06:58:10Z) - The Lasso with general Gaussian designs with applications to hypothesis
testing [21.342900543543816]
The Lasso is a method for high-dimensional regression.
We show that the Lasso estimator can be precisely characterized in the regime in which both $n$ and $p$ are large.
arXiv Detail & Related papers (2020-07-27T17:48:54Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.