Related papers: A Regularization-Sharpness Tradeoff for Linear Interpolators

A Regularization-Sharpness Tradeoff for Linear Interpolators

URL: http://arxiv.org/abs/2602.12680v1
Date: Fri, 13 Feb 2026 07:21:08 GMT
Title: A Regularization-Sharpness Tradeoff for Linear Interpolators
Authors: Qingyi Hu, Liam Hodgkinson,
Abstract summary: We propose a regularization-sharpness tradeoff for over parameterized linear regression with an $ellp$ penalty.<n>Inspired by the interpolating information criterion, our framework decomposes the selection penalty into a regularization term.<n>Building on prior analyses that established this information criterion for ridge regularizers, this work first provides a general expression of the interpolating information criterion for $ellp$ regularizers.
Score: 8.628516727959259
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rule of thumb regarding the relationship between the bias-variance tradeoff and model size plays a key role in classical machine learning, but is now well-known to break down in the overparameterized setting as per the double descent curve. In particular, minimum-norm interpolating estimators can perform well, suggesting the need for new tradeoff in these settings. Accordingly, we propose a regularization-sharpness tradeoff for overparameterized linear regression with an $\ell^p$ penalty. Inspired by the interpolating information criterion, our framework decomposes the selection penalty into a regularization term (quantifying the alignment of the regularizer and the interpolator) and a geometric sharpness term on the interpolating manifold (quantifying the effect of local perturbations), yielding a tradeoff analogous to bias-variance. Building on prior analyses that established this information criterion for ridge regularizers, this work first provides a general expression of the interpolating information criterion for $\ell^p$ regularizers where $p \ge 2$. Subsequently, we extend this to the LASSO interpolator with $\ell^1$ regularizer, which induces stronger sparsity. Empirical results on real-world datasets with random Fourier features and polynomials validate our theory, demonstrating how the tradeoff terms can distinguish performant linear interpolators from weaker ones.

Related papers

Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
We consider the problem of contextual online RLHF with general preferences.<n>We adopt the Generalized Bilinear Preference Model to capture preferences via low-rank, skew-symmetric matrices.<n>We prove that the dual gap of the greedy policy is bounded by the square of the estimation error.
arXiv Detail & Related papers (2026-02-26T15:27:53Z)
On the Computational Complexity of Performative Prediction [59.584063949469645]
We show that computing an $$-performatively stable point is PPAD-complete.<n>One of our key technical contributions is to extend this PPAD-hardness result to general convex domains.
arXiv Detail & Related papers (2026-01-28T02:26:35Z)
Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy [13.264499801590583]
We introduce new high-probability spectral-norm perturbation bounds for symmetric matrix $A in mathbbRn times n$ and an arbitrary symmetric perturbationE$.<n>Under mild eigengap and norm conditions, our bounds yield sharp estimates for $|(A + E)_p - A_p|$, with improvements of up to a factor of $sqrtn$.<n>As an application, we derive improved utility guarantees for differentially private PCA, resolving an open problem in the literature.
arXiv Detail & Related papers (2025-10-29T16:36:00Z)
Shrinkage to Infinity: Reducing Test Error by Inflating the Minimum Norm Interpolator in Linear Models [0.0]
Hastie et al. (2022) found that ridge regularization is essential in high dimensional linear regression $y=betaTx + epsilon$<n>We make precise this observation for linear regression with highly anisotropic covariances and $d/n$.<n>We find that simply scaling up (or inflating) the minimum $ell$ interpolator by a constant greater than one can improve the generalization error.
arXiv Detail & Related papers (2025-10-22T03:30:27Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks [117.93273337740442]
We show that gradient descent converges to a uniform margin classifier on the training data with an $exp(-Omega(log2 t))$ convergence rate. We also show that batch normalization has an implicit bias towards a patch-wise uniform margin.
arXiv Detail & Related papers (2023-06-20T16:58:00Z)
Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression [16.551664358490658]
In deep learning, often the training process finds an interpolator (a solution with 0 training loss) but the test loss is still low. One common mechanism for benign overfitting is implicit regularization, where the training process leads to additional properties for the interpolator. We show that training our new model via gradient descent leads to an interpolator with near-optimal test loss.
arXiv Detail & Related papers (2023-02-01T05:41:41Z)
A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models [33.36787620121057]
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space. We use our finite-sample bound to directly recover the "optimistic rate" of Zhou et al. (2021) We show that application of our bound generalization using localized Gaussian width will generally be sharp for empirical risk minimizers.
arXiv Detail & Related papers (2022-10-21T16:16:55Z)
$p$-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses. We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z)
Fast rates for noisy interpolation require rethinking the effects of inductive bias [8.946655323517092]
Good generalization performance on high-dimensional data hinges on a simple structure of the ground truth and a strong inductive bias of the estimator. Our results suggest that, while a stronger inductive bias encourages a simpler structure that is more aligned with the ground truth, it also increases the detrimental effect of noise.
arXiv Detail & Related papers (2022-03-07T18:44:47Z)
Minimum $\ell_{1}$-norm interpolators: Precise asymptotics and multiple descent [19.781475554462553]
This paper pursues theoretical understanding for an important type of interpolators: the minimum $ell_1$-norm interpolator. We observe, and provide rigorous theoretical justification for, a curious multi-descent phenomenon. Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two non-linear equations with two unknowns.
arXiv Detail & Related papers (2021-10-18T17:51:14Z)
Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise. We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.