Multi-scale Feature Learning Dynamics: Insights for Double Descent
- URL: http://arxiv.org/abs/2112.03215v1
- Date: Mon, 6 Dec 2021 18:17:08 GMT
- Title: Multi-scale Feature Learning Dynamics: Insights for Double Descent
- Authors: Mohammad Pezeshki, Amartya Mitra, Yoshua Bengio, Guillaume Lajoie
- Abstract summary: We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
- Score: 71.91871020059857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge in building theoretical foundations for deep learning is the
complex optimization dynamics of neural networks, resulting from the
high-dimensional interactions between the large number of network parameters.
Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon
of "double descent" of the generalization error. The more commonly studied
aspect of this phenomenon corresponds to model-wise double descent where the
test error exhibits a second descent with increasing model complexity, beyond
the classical U-shaped error curve. In this work, we investigate the origins of
the less studied epoch-wise double descent in which the test error undergoes
two non-monotonous transitions, or descents as the training time increases. By
leveraging tools from statistical physics, we study a linear teacher-student
setup exhibiting epoch-wise double descent similar to that in deep neural
networks. In this setting, we derive closed-form analytical expressions for the
evolution of generalization error over training. We find that double descent
can be attributed to distinct features being learned at different scales: as
fast-learning features overfit, slower-learning features start to fit,
resulting in a second descent in test error. We validate our findings through
numerical experiments where our theory accurately predicts empirical findings
and remains consistent with observations in deep neural networks.
Related papers
- Towards understanding epoch-wise double descent in two-layer linear neural networks [11.210628847081097]
We study epoch-wise double descent in two-layer linear neural networks.
We identify additional factors of epoch-wise double descent emerging with the extra model layer.
This opens up for further questions regarding unidentified factors of epoch-wise double descent for truly deep models.
arXiv Detail & Related papers (2024-07-13T10:45:21Z) - Understanding the Double Descent Phenomenon in Deep Learning [49.1574468325115]
This tutorial sets the classical statistical learning framework and introduces the double descent phenomenon.
By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting.
section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
arXiv Detail & Related papers (2024-03-15T16:51:24Z) - The twin peaks of learning neural networks [3.382017614888546]
Recent works demonstrated the existence of a double-descent phenomenon for the generalization error of neural networks.
We explore a link between this phenomenon and the increase of complexity and sensitivity of the function represented by neural networks.
arXiv Detail & Related papers (2024-01-23T10:09:14Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Double Descent Demystified: Identifying, Interpreting & Ablating the
Sources of a Deep Learning Puzzle [12.00962791565144]
Double descent is a surprising phenomenon in machine learning.
As the number of model parameters grows relative to the number of data, test error drops.
arXiv Detail & Related papers (2023-03-24T17:03:40Z) - Learning time-scales in two-layers neural networks [11.878594839685471]
We study the gradient flow dynamics of a wide two-layer neural network in high-dimension.
Based on new rigorous results, we propose a scenario for the learning dynamics in this setting.
arXiv Detail & Related papers (2023-02-28T19:52:26Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent.
We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z) - Early Stopping in Deep Networks: Double Descent and How to Eliminate it [30.61588337557343]
We show that epoch-wise double descent arises because different parts of the network are learned at different epochs.
We study two standard convolutional networks empirically and show that eliminating epoch-wise double descent through adjusting stepsizes of different layers improves the early stopping performance significantly.
arXiv Detail & Related papers (2020-07-20T13:43:33Z) - The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics.
We find good agreement between our model's predictions and training dynamics in realistic deep learning settings.
We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.