Fast rates for noisy interpolation require rethinking the effects of
inductive bias
- URL: http://arxiv.org/abs/2203.03597v1
- Date: Mon, 7 Mar 2022 18:44:47 GMT
- Title: Fast rates for noisy interpolation require rethinking the effects of
inductive bias
- Authors: Konstantin Donhauser, Nicolo Ruggeri, Stefan Stojanovic and Fanny Yang
- Abstract summary: Good generalization performance on high-dimensional data hinges on a simple structure of the ground truth and a strong inductive bias of the estimator.
Our results suggest that, while a stronger inductive bias encourages a simpler structure that is more aligned with the ground truth, it also increases the detrimental effect of noise.
- Score: 8.946655323517092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Good generalization performance on high-dimensional data crucially hinges on
a simple structure of the ground truth and a corresponding strong inductive
bias of the estimator. Even though this intuition is valid for regularized
models, in this paper we caution against a strong inductive bias for
interpolation in the presence of noise: Our results suggest that, while a
stronger inductive bias encourages a simpler structure that is more aligned
with the ground truth, it also increases the detrimental effect of noise.
Specifically, for both linear regression and classification with a sparse
ground truth, we prove that minimum $\ell_p$-norm and maximum $\ell_p$-margin
interpolators achieve fast polynomial rates up to order $1/n$ for $p > 1$
compared to a logarithmic rate for $p = 1$. Finally, we provide experimental
evidence that this trade-off may also play a crucial role in understanding
non-linear interpolating models used in practice.
Related papers
- Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees [56.80920351680438]
We study high-probability convergence in online learning, in the presence of heavy-tailed noise.
Compared to state-of-the-art, who only consider clipping and require noise with bounded $p$-th moments, we provide guarantees for a broad class of nonlinearities.
arXiv Detail & Related papers (2024-10-17T18:25:28Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Strong inductive biases provably prevent harmless interpolation [8.946655323517092]
This paper argues that the degree to which is harmless hinges upon the strength of an estimator's inductive bias.
Our main theoretical result establishes tight non-asymptotic bounds for high-dimensional kernel regression.
arXiv Detail & Related papers (2023-01-18T15:37:11Z) - Intrinsic dimensionality and generalization properties of the
$\mathcal{R}$-norm inductive bias [4.37441734515066]
The $mathcalR$-norm is the basis of an inductive bias for two-layer neural networks.
We find that these interpolants are intrinsically multivariate functions, even when there are ridge functions that fit the data.
arXiv Detail & Related papers (2022-06-10T18:33:15Z) - Minimum $\ell_{1}$-norm interpolators: Precise asymptotics and multiple
descent [19.781475554462553]
This paper pursues theoretical understanding for an important type of interpolators: the minimum $ell_1$-norm interpolator.
We observe, and provide rigorous theoretical justification for, a curious multi-descent phenomenon.
Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two non-linear equations with two unknowns.
arXiv Detail & Related papers (2021-10-18T17:51:14Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise.
We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.