On Uniform Convergence and Low-Norm Interpolation Learning
- URL: http://arxiv.org/abs/2006.05942v3
- Date: Thu, 14 Jan 2021 00:22:10 GMT
- Title: On Uniform Convergence and Low-Norm Interpolation Learning
- Authors: Lijia Zhou and Danica J. Sutherland and Nathan Srebro
- Abstract summary: We show that uniformly bounding the difference between empirical and population errors cannot show any learning in the norm ball.
We argue we can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion: uniform convergence of zero-error predictors in a norm ball.
- Score: 25.96459928769678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider an underdetermined noisy linear regression model where the
minimum-norm interpolating predictor is known to be consistent, and ask: can
uniform convergence in a norm ball, or at least (following Nagarajan and
Kolter) the subset of a norm ball that the algorithm selects on a typical input
set, explain this success? We show that uniformly bounding the difference
between empirical and population errors cannot show any learning in the norm
ball, and cannot show consistency for any set, even one depending on the exact
algorithm and distribution. But we argue we can explain the consistency of the
minimal-norm interpolator with a slightly weaker, yet standard, notion: uniform
convergence of zero-error predictors in a norm ball. We use this to bound the
generalization error of low- (but not minimal-) norm interpolating predictors.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - The Implicit Bias of Batch Normalization in Linear Models and Two-layer
Linear Convolutional Neural Networks [117.93273337740442]
We show that gradient descent converges to a uniform margin classifier on the training data with an $exp(-Omega(log2 t))$ convergence rate.
We also show that batch normalization has an implicit bias towards a patch-wise uniform margin.
arXiv Detail & Related papers (2023-06-20T16:58:00Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and
Benign Overfitting [35.78863301525758]
We prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class.
Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. ( 2020) for minimum-norm interpolators.
We show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings.
arXiv Detail & Related papers (2021-06-17T06:58:10Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Exact Gap between Generalization Error and Uniform Convergence in Random
Feature Models [12.588676054808044]
We show that there could be a large gap between the classical uniform convergence bound and the actual test error of zero-training-error predictors (interpolators)
We derive and prove analytical expressions for three quantities in this model.
We show that, in the setting where the classical uniform convergence bound is vacuous (diverges to $infty$), uniform convergence over the interpolators still gives a non-trivial bound of the test error of interpolating solutions.
arXiv Detail & Related papers (2021-03-08T05:20:36Z) - On the robustness of minimum-norm interpolators [0.0]
This article develops a general theory for minimum-norm interpolated estimators in linear models in the presence of additive, potentially adversarial, errors.
A quantitative bound for the prediction error is given, relating it to the Rademacher norm of the minimum norm interpolator of the errors and the shape of the subdifferential around the true parameter.
arXiv Detail & Related papers (2020-12-01T20:03:20Z) - Failures of model-dependent generalization bounds for least-norm
interpolation [39.97534972432276]
We consider bounds on the generalization performance of the least-norm linear regressor.
For a variety of natural joint distributions on training examples, any valid generalization bound must sometimes be very loose.
arXiv Detail & Related papers (2020-10-16T16:30:05Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Explicit Regularization of Stochastic Gradient Methods through Duality [9.131027490864938]
We propose randomized Dykstra-style algorithms based on randomized dual coordinate ascent.
For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing gradient methods in the interpolating regime.
arXiv Detail & Related papers (2020-03-30T20:44:56Z) - Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable
Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows.
We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv Detail & Related papers (2020-03-26T22:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.