Harmless interpolation in regression and classification with structured
features
- URL: http://arxiv.org/abs/2111.05198v1
- Date: Tue, 9 Nov 2021 15:12:26 GMT
- Title: Harmless interpolation in regression and classification with structured
features
- Authors: Andrew D. McRae and Santhosh Karnik and Mark A. Davenport and Vidya
Muthukumar
- Abstract summary: Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data.
We present a general and flexible framework for upper bounding regression and classification risk in a reproducing kernel Hilbert space.
- Score: 21.064512161584872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overparametrized neural networks tend to perfectly fit noisy training data
yet generalize well on test data. Inspired by this empirical observation,
recent work has sought to understand this phenomenon of benign overfitting or
harmless interpolation in the much simpler linear model. Previous theoretical
work critically assumes that either the data features are statistically
independent or the input data is high-dimensional; this precludes general
nonparametric settings with structured feature maps. In this paper, we present
a general and flexible framework for upper bounding regression and
classification risk in a reproducing kernel Hilbert space. A key contribution
is that our framework describes precise sufficient conditions on the data Gram
matrix under which harmless interpolation occurs. Our results recover prior
independent-features results (with a much simpler analysis), but they
furthermore show that harmless interpolation can occur in more general settings
such as features that are a bounded orthonormal system. Furthermore, our
results show an asymptotic separation between classification and regression
performance in a manner that was previously only shown for Gaussian features.
Related papers
- Minimum-Norm Interpolation Under Covariate Shift [14.863831433459902]
In-distribution research on high-dimensional linear regression has led to the identification of a phenomenon known as textitbenign overfitting
We prove the first non-asymptotic excess risk bounds for benignly-overfit linear interpolators in the transfer learning setting.
arXiv Detail & Related papers (2024-03-31T01:41:57Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Benign overfitting in ridge regression [0.0]
We provide non-asymptotic generalization bounds for overparametrized ridge regression.
We identify when small or negative regularization is sufficient for obtaining small generalization error.
arXiv Detail & Related papers (2020-09-29T20:00:31Z) - The Neural Tangent Kernel in High Dimensions: Triple Descent and a
Multi-Scale Theory of Generalization [34.235007566913396]
Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well.
An emerging paradigm for describing this unexpected behavior is in terms of a emphdouble descent curve.
We provide a precise high-dimensional analysis of generalization with the Neural Tangent Kernel, which characterizes the behavior of wide neural networks with gradient descent.
arXiv Detail & Related papers (2020-08-15T20:55:40Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Linear predictor on linearly-generated data with missing values: non
consistency and solutions [0.0]
We study the seemingly-simple case where the target to predict is a linear function of the fully-observed data.
We show that, in the presence of missing values, the optimal predictor may not be linear.
arXiv Detail & Related papers (2020-02-03T11:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.