Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and
Benign Overfitting
- URL: http://arxiv.org/abs/2106.09276v1
- Date: Thu, 17 Jun 2021 06:58:10 GMT
- Title: Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and
Benign Overfitting
- Authors: Frederic Koehler and Lijia Zhou and Danica J. Sutherland and Nathan
Srebro
- Abstract summary: We prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class.
Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. ( 2020) for minimum-norm interpolators.
We show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings.
- Score: 35.78863301525758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider interpolation learning in high-dimensional linear regression with
Gaussian data, and prove a generic uniform convergence guarantee on the
generalization error of interpolators in an arbitrary hypothesis class in terms
of the class's Gaussian width. Applying the generic bound to Euclidean norm
balls recovers the consistency result of Bartlett et al. (2020) for
minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for
near-minimal-norm interpolators in the special case of Gaussian data. We
demonstrate the generality of the bound by applying it to the simplex,
obtaining a novel consistency result for minimum l1-norm interpolators (basis
pursuit). Our results show how norm-based generalization bounds can explain and
be used to analyze benign overfitting, at least in some settings.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Precise Asymptotic Generalization for Multiclass Classification with
Overparameterized Linear Models [4.093769373833101]
We resolve the conjecture posed in Subramanian et al.'22, where the number of data points, features, and classes all grow together.
Our new lower bounds are akin to an information-theoretic strong converse: they establish that the misclassification rate goes to 0 or 1ally.
The key to our tight analysis is a new variant of the Hanson-Wright inequality which is broadly useful for multiclass problems with sparse labels.
arXiv Detail & Related papers (2023-06-23T00:59:15Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized
Linear Models [33.36787620121057]
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space.
We use our finite-sample bound to directly recover the "optimistic rate" of Zhou et al. (2021)
We show that application of our bound generalization using localized Gaussian width will generally be sharp for empirical risk minimizers.
arXiv Detail & Related papers (2022-10-21T16:16:55Z) - Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise.
We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Exact Gap between Generalization Error and Uniform Convergence in Random
Feature Models [12.588676054808044]
We show that there could be a large gap between the classical uniform convergence bound and the actual test error of zero-training-error predictors (interpolators)
We derive and prove analytical expressions for three quantities in this model.
We show that, in the setting where the classical uniform convergence bound is vacuous (diverges to $infty$), uniform convergence over the interpolators still gives a non-trivial bound of the test error of interpolating solutions.
arXiv Detail & Related papers (2021-03-08T05:20:36Z) - On the robustness of minimum-norm interpolators [0.0]
This article develops a general theory for minimum-norm interpolated estimators in linear models in the presence of additive, potentially adversarial, errors.
A quantitative bound for the prediction error is given, relating it to the Rademacher norm of the minimum norm interpolator of the errors and the shape of the subdifferential around the true parameter.
arXiv Detail & Related papers (2020-12-01T20:03:20Z) - On Uniform Convergence and Low-Norm Interpolation Learning [25.96459928769678]
We show that uniformly bounding the difference between empirical and population errors cannot show any learning in the norm ball.
We argue we can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion: uniform convergence of zero-error predictors in a norm ball.
arXiv Detail & Related papers (2020-06-10T16:49:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.