Related papers: Deep Double Descent via Smooth Interpolation

Deep Double Descent via Smooth Interpolation

URL: http://arxiv.org/abs/2209.10080v4
Date: Sat, 8 Apr 2023 21:31:35 GMT
Title: Deep Double Descent via Smooth Interpolation
Authors: Matteo Gamba and Erik Englesson and M{\aa}rten Bj\"orkman and Hossein Azizpour
Abstract summary: We quantify sharpness of fit of training data by studying the loss landscape w.r.t. to the input variable locally to each training point. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy targets. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, in contrast to existing intuition.
Score: 2.141079906482723
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The ability of overparameterized deep networks to interpolate noisy data, while at the same time showing good generalization performance, has been recently characterized in terms of the double descent curve for the test error. Common intuition from polynomial regression suggests that overparameterized networks are able to sharply interpolate noisy data, without considerably deviating from the ground-truth signal, thus preserving generalization ability. At present, a precise characterization of the relationship between interpolation and generalization for deep networks is missing. In this work, we quantify sharpness of fit of the training data interpolated by neural network functions, by studying the loss landscape w.r.t. to the input variable locally to each training point, over volumes around cleanly- and noisily-labelled training samples, as we systematically increase the number of model parameters and training epochs. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, where noisy targets are predicted over large volumes around training data points, in contrast to existing intuition.

Related papers

A Two-Phase Perspective on Deep Learning Dynamics [0.0]
We propose that learning in deep neural networks proceeds in two phases: a rapid curve fitting phase followed by a slower compression or coarse graining phase. We empirically show that the associated timescales align in two rather different settings. We argue that the second phase is not actively optimized by standard training algorithms and may be unnecessarily prolonged.
arXiv Detail & Related papers (2025-04-17T06:57:37Z)
Rethinking Benign Overfitting in Two-Layer Neural Networks [2.486161976966064]
We refine the feature-noise data model by incorporating class-dependent heterogeneous noise and re-examine the overfitting phenomenon in neural networks. Our findings reveal that neural networks can leverage "data noise", previously deemed harmful, to learn implicit features that improve the classification accuracy for long-tailed data.
arXiv Detail & Related papers (2025-02-17T15:20:04Z)
Simplicity bias and optimization threshold in two-layer ReLU networks [24.43739371803548]
We show that despite overparametrization, networks converge toward simpler solutions rather than interpolating the training data. Our analysis relies on the so called early alignment phase, during which neurons align towards specific directions.
arXiv Detail & Related papers (2024-10-03T09:58:57Z)
Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations. In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z)
Contrasting random and learned features in deep Bayesian linear regression [12.234742322758418]
We study how the ability to learn affects the generalization performance of a simple class of models. By comparing deep random feature models to deep networks in which all layers are trained, we provide a detailed characterization of the interplay between width, depth, data density, and prior mismatch.
arXiv Detail & Related papers (2022-03-01T15:51:29Z)
Robust Training under Label Noise by Over-parameterization [41.03008228953627]
We propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data. Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets.
arXiv Detail & Related papers (2022-02-28T18:50:10Z)
Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data [44.431266188350655]
We consider the generalization error of two-layer neural networks trained to generalize by gradient descent. We show that neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
arXiv Detail & Related papers (2022-02-11T23:04:00Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data. We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z)
What training reveals about neural network complexity [80.87515604428346]
This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training. Our results support the hypothesis that good training behavior can be a useful bias towards good generalization.
arXiv Detail & Related papers (2021-06-08T08:58:00Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes. We propose a new recognition setting, namely semi-supervised long-tailed recognition. We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.