On Emergence of Clean-Priority Learning in Early Stopped Neural Networks
- URL: http://arxiv.org/abs/2306.02533v1
- Date: Mon, 5 Jun 2023 01:45:22 GMT
- Title: On Emergence of Clean-Priority Learning in Early Stopped Neural Networks
- Authors: Chaoyue Liu, Amirhesam Abedsoltan, Mikhail Belkin
- Abstract summary: When random label noise is added to a training dataset, the prediction error of a neural network on a label-noise-free test dataset deteriorates.
This behaviour is believed to be a result of neural networks learning the pattern of clean data first and fitting the noise later in the training.
We show both theoretically and experimentally, as the clean-priority learning goes on, the dominance of the gradients of clean samples over those of noisy samples diminishes.
- Score: 18.725557157004214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When random label noise is added to a training dataset, the prediction error
of a neural network on a label-noise-free test dataset initially improves
during early training but eventually deteriorates, following a U-shaped
dependence on training time. This behaviour is believed to be a result of
neural networks learning the pattern of clean data first and fitting the noise
later in the training, a phenomenon that we refer to as clean-priority
learning. In this study, we aim to explore the learning dynamics underlying
this phenomenon. We theoretically demonstrate that, in the early stage of
training, the update direction of gradient descent is determined by the clean
subset of training data, leaving the noisy subset has minimal to no impact,
resulting in a prioritization of clean learning. Moreover, we show both
theoretically and experimentally, as the clean-priority learning goes on, the
dominance of the gradients of clean samples over those of noisy samples
diminishes, and finally results in a termination of the clean-priority learning
and fitting of the noisy samples.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Benign Overfitting without Linearity: Neural Network Classifiers Trained
by Gradient Descent for Noisy Linear Data [44.431266188350655]
We consider the generalization error of two-layer neural networks trained to generalize by gradient descent.
We show that neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error.
In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
arXiv Detail & Related papers (2022-02-11T23:04:00Z) - When and how epochwise double descent happens [7.512375012141203]
An epochwise double descent' effect exists in which the generalization error initially drops, then rises, and finally drops again with increasing training time.
This presents a practical problem in that the amount of time required for training is long, and early stopping based on validation performance may result in suboptimal generalization.
We show that epochwise double descent requires a critical amount of noise to occur, but above a second critical noise level early stopping remains effective.
arXiv Detail & Related papers (2021-08-26T19:19:17Z) - A Theoretical Analysis of Learning with Noisily Labeled Data [62.946840431501855]
We first show that in the first epoch training, the examples with clean labels will be learned first.
We then show that after the learning from clean data stage, continuously training model can achieve further improvement in testing error.
arXiv Detail & Related papers (2021-04-08T23:40:02Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.