Double Trouble in Double Descent : Bias and Variance(s) in the Lazy
Regime
- URL: http://arxiv.org/abs/2003.01054v2
- Date: Fri, 3 Apr 2020 07:42:38 GMT
- Title: Double Trouble in Double Descent : Bias and Variance(s) in the Lazy
Regime
- Authors: St\'ephane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala
- Abstract summary: Deep neural networks can achieve remarkable performances while interpolating the training data perfectly.
Rather than the U-curve of the bias-variance trade-off, their test error often follows a "double descent"
We develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks.
- Score: 32.65347128465841
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks can achieve remarkable generalization performances while
interpolating the training data perfectly. Rather than the U-curve emblematic
of the bias-variance trade-off, their test error often follows a "double
descent" - a mark of the beneficial role of overparametrization. In this work,
we develop a quantitative theory for this phenomenon in the so-called lazy
learning regime of neural networks, by considering the problem of learning a
high-dimensional function with random features regression. We obtain a precise
asymptotic expression for the bias-variance decomposition of the test error,
and show that the bias displays a phase transition at the interpolation
threshold, beyond which it remains constant. We disentangle the variances
stemming from the sampling of the dataset, from the additive noise corrupting
the labels, and from the initialization of the weights. Following up on Geiger
et al. 2019, we first show that the latter two contributions are the crux of
the double descent: they lead to the overfitting peak at the interpolation
threshold and to the decay of the test error upon overparametrization. We then
quantify how they are suppressed by ensemble averaging the outputs of K
independently initialized estimators. When K is sent to infinity, the test
error remains constant beyond the interpolation threshold. We further compare
the effects of overparametrizing, ensembling and regularizing. Finally, we
present numerical experiments on classic deep learning setups to show that our
results hold qualitatively in realistic lazy learning scenarios.
Related papers
- Multiple Descents in Unsupervised Learning: The Role of Noise, Domain Shift and Anomalies [14.399035468023161]
We study the presence of double descent in unsupervised learning, an area that has received little attention and is not yet fully understood.
We use synthetic and real data and identify model-wise, epoch-wise, and sample-wise double descent for various applications.
arXiv Detail & Related papers (2024-06-17T16:24:23Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - SLA$^2$P: Self-supervised Anomaly Detection with Adversarial
Perturbation [77.71161225100927]
Anomaly detection is a fundamental yet challenging problem in machine learning.
We propose a novel and powerful framework, dubbed as SLA$2$P, for unsupervised anomaly detection.
arXiv Detail & Related papers (2021-11-25T03:53:43Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent.
We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.