Related papers: Understanding the Double Descent Phenomenon in Deep Learning

Understanding the Double Descent Phenomenon in Deep Learning

URL: http://arxiv.org/abs/2403.10459v1
Date: Fri, 15 Mar 2024 16:51:24 GMT
Title: Understanding the Double Descent Phenomenon in Deep Learning
Authors: Marc Lafon, Alexandre Thomas,
Abstract summary: This tutorial sets the classical statistical learning framework and introduces the double descent phenomenon. By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting. section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
Score: 49.1574468325115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Combining empirical risk minimization with capacity control is a classical strategy in machine learning when trying to control the generalization gap and avoid overfitting, as the model class capacity gets larger. Yet, in modern deep learning practice, very large over-parameterized models (e.g. neural networks) are optimized to fit perfectly the training data and still obtain great generalization performance. Past the interpolation point, increasing model complexity seems to actually lower the test error. In this tutorial, we explain the concept of double descent and its mechanisms. The first section sets the classical statistical learning framework and introduces the double descent phenomenon. By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting, among the multiple interpolating solutions, a smooth empirical risk minimizer. Finally, section 3 explores the double descent with two linear models, and gives other points of view from recent related works.

Related papers

A Classical View on Benign Overfitting: The Role of Sample Size [14.36840959836957]
We focus on almost benign overfitting, where models simultaneously achieve both arbitrarily small training and test errors.<n>This behavior is characteristic of neural networks, which often achieve low (but non-zero) training error while still generalizing well.
arXiv Detail & Related papers (2025-05-16T18:37:51Z)
The Double Descent Behavior in Two Layer Neural Network for Binary Classification [46.3107850275261]
Recent studies observed a surprising concept on model test error called the double descent phenomenon. Our aim is to observe and investigate the mathematical theory behind the double descent behavior of model test error for varying model sizes.
arXiv Detail & Related papers (2025-04-27T20:29:24Z)
Towards understanding epoch-wise double descent in two-layer linear neural networks [11.210628847081097]
We study epoch-wise double descent in two-layer linear neural networks. We identify additional factors of epoch-wise double descent emerging with the extra model layer. This opens up for further questions regarding unidentified factors of epoch-wise double descent for truly deep models.
arXiv Detail & Related papers (2024-07-13T10:45:21Z)
A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n. This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z)
Neural Collapse Terminus: A Unified Solution for Class Incremental Learning and Its Variants [166.916517335816]
In this paper, we offer a unified solution to the misalignment dilemma in the three tasks. We propose neural collapse terminus that is a fixed structure with the maximal equiangular inter-class separation for the whole label space. Our method holds the neural collapse optimality in an incremental fashion regardless of data imbalance or data scarcity.
arXiv Detail & Related papers (2023-08-03T13:09:59Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle [12.00962791565144]
Double descent is a surprising phenomenon in machine learning. As the number of model parameters grows relative to the number of data, test error drops.
arXiv Detail & Related papers (2023-03-24T17:03:40Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
On the Role of Optimization in Double Descent: A Least Squares Study [30.44215064390409]
We show an excess risk bound for the descent gradient solution of the least squares objective. We find that in case of noiseless regression, double descent is explained solely by optimization-related quantities. We empirically explore if our predictions hold for neural networks.
arXiv Detail & Related papers (2021-07-27T09:13:11Z)
Double Descent and Other Interpolation Phenomena in GANs [2.7007335372861974]
We study the generalization error as a function of latent space dimension in generative adversarial networks (GANs) We develop a novel pseudo-supervised learning approach for GANs where the training utilizes pairs of fabricated (noise) inputs in conjunction with real output samples. While our analysis focuses mostly on linear models, we also apply important insights for improving generalization of nonlinear, multilayer GANs.
arXiv Detail & Related papers (2021-06-07T23:07:57Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.