Related papers: Multi-scale Feature Learning Dynamics: Insights for Double Descent

Multi-scale Feature Learning Dynamics: Insights for Double Descent

URL: http://arxiv.org/abs/2112.03215v1
Date: Mon, 6 Dec 2021 18:17:08 GMT
Title: Multi-scale Feature Learning Dynamics: Insights for Double Descent
Authors: Mohammad Pezeshki, Amartya Mitra, Yoshua Bengio, Guillaume Lajoie
Abstract summary: We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
Score: 71.91871020059857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomenon corresponds to model-wise double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied epoch-wise double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. By leveraging tools from statistical physics, we study a linear teacher-student setup exhibiting epoch-wise double descent similar to that in deep neural networks. In this setting, we derive closed-form analytical expressions for the evolution of generalization error over training. We find that double descent can be attributed to distinct features being learned at different scales: as fast-learning features overfit, slower-learning features start to fit, resulting in a second descent in test error. We validate our findings through numerical experiments where our theory accurately predicts empirical findings and remains consistent with observations in deep neural networks.

Related papers

Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks [12.061229162870513]
We study the training dynamics of two-layer neural networks. We find several new phenomena in the training dynamics. These include the emergence of a slow time scale associated with the growth in Gaussian/Rademacher complexity.
arXiv Detail & Related papers (2025-02-28T17:45:26Z)
Towards understanding epoch-wise double descent in two-layer linear neural networks [11.210628847081097]
We study epoch-wise double descent in two-layer linear neural networks. We identify additional factors of epoch-wise double descent emerging with the extra model layer. This opens up for further questions regarding unidentified factors of epoch-wise double descent for truly deep models.
arXiv Detail & Related papers (2024-07-13T10:45:21Z)
Understanding the Double Descent Phenomenon in Deep Learning [49.1574468325115]
This tutorial sets the classical statistical learning framework and introduces the double descent phenomenon. By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting. section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
arXiv Detail & Related papers (2024-03-15T16:51:24Z)
The twin peaks of learning neural networks [3.382017614888546]
Recent works demonstrated the existence of a double-descent phenomenon for the generalization error of neural networks. We explore a link between this phenomenon and the increase of complexity and sensitivity of the function represented by neural networks.
arXiv Detail & Related papers (2024-01-23T10:09:14Z)
A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n. This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z)
Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle [12.00962791565144]
Double descent is a surprising phenomenon in machine learning. As the number of model parameters grows relative to the number of data, test error drops.
arXiv Detail & Related papers (2023-03-24T17:03:40Z)
Learning time-scales in two-layers neural networks [11.878594839685471]
We study the gradient flow dynamics of a wide two-layer neural network in high-dimension. Based on new rigorous results, we propose a scenario for the learning dynamics in this setting.
arXiv Detail & Related papers (2023-02-28T19:52:26Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent. We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z)
Early Stopping in Deep Networks: Double Descent and How to Eliminate it [30.61588337557343]
We show that epoch-wise double descent arises because different parts of the network are learned at different epochs. We study two standard convolutional networks empirically and show that eliminating epoch-wise double descent through adjusting stepsizes of different layers improves the early stopping performance significantly.
arXiv Detail & Related papers (2020-07-20T13:43:33Z)
The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.