Double Descent Demystified: Identifying, Interpreting & Ablating the
Sources of a Deep Learning Puzzle
- URL: http://arxiv.org/abs/2303.14151v1
- Date: Fri, 24 Mar 2023 17:03:40 GMT
- Title: Double Descent Demystified: Identifying, Interpreting & Ablating the
Sources of a Deep Learning Puzzle
- Authors: Rylan Schaeffer, Mikail Khona, Zachary Robertson, Akhilan Boopathy,
Kateryna Pistunova, Jason W. Rocks, Ila Rani Fiete, Oluwasanmi Koyejo
- Abstract summary: Double descent is a surprising phenomenon in machine learning.
As the number of model parameters grows relative to the number of data, test error drops.
- Score: 12.00962791565144
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Double descent is a surprising phenomenon in machine learning, in which as
the number of model parameters grows relative to the number of data, test error
drops as models grow ever larger into the highly overparameterized (data
undersampled) regime. This drop in test error flies against classical learning
theory on overfitting and has arguably underpinned the success of large models
in machine learning. This non-monotonic behavior of test loss depends on the
number of data, the dimensionality of the data and the number of model
parameters. Here, we briefly describe double descent, then provide an
explanation of why double descent occurs in an informal and approachable
manner, requiring only familiarity with linear algebra and introductory
probability. We provide visual intuition using polynomial regression, then
mathematically analyze double descent with ordinary linear regression and
identify three interpretable factors that, when simultaneously all present,
together create double descent. We demonstrate that double descent occurs on
real data when using ordinary linear regression, then demonstrate that double
descent does not occur when any of the three factors are ablated. We use this
understanding to shed light on recent observations in nonlinear models
concerning superposition and double descent. Code is publicly available.
Related papers
- Towards understanding epoch-wise double descent in two-layer linear neural networks [11.210628847081097]
We study epoch-wise double descent in two-layer linear neural networks.
We identify additional factors of epoch-wise double descent emerging with the extra model layer.
This opens up for further questions regarding unidentified factors of epoch-wise double descent for truly deep models.
arXiv Detail & Related papers (2024-07-13T10:45:21Z) - Understanding the Double Descent Phenomenon in Deep Learning [49.1574468325115]
This tutorial sets the classical statistical learning framework and introduces the double descent phenomenon.
By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting.
section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
arXiv Detail & Related papers (2024-03-15T16:51:24Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Analysis of Interpolating Regression Models and the Double Descent
Phenomenon [3.883460584034765]
It is commonly assumed that models which interpolate noisy training data are poor to generalize.
The best models obtained are overparametrized and the testing error exhibits the double descent behavior as the model order increases.
We derive a result based on the behavior of the smallest singular value of the regression matrix that explains the peak location and the double descent shape of the testing error as a function of model order.
arXiv Detail & Related papers (2023-04-17T09:44:33Z) - What learning algorithm is in-context learning? Investigations with
linear models [87.91612418166464]
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly.
We show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression.
Preliminary evidence that in-context learners share algorithmic features with these predictors.
arXiv Detail & Related papers (2022-11-28T18:59:51Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent.
We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.