Stochastic Resetting Mitigates Latent Gradient Bias of SGD from Label Noise
- URL: http://arxiv.org/abs/2406.00396v3
- Date: Tue, 04 Mar 2025 05:51:53 GMT
- Title: Stochastic Resetting Mitigates Latent Gradient Bias of SGD from Label Noise
- Authors: Youngkyoung Bae, Yeongwoo Song, Hawoong Jeong,
- Abstract summary: We show that resetting from a checkpoint can significantly improve generalization performance when training deep neural networks (DNNs) with noisy labels.<n>In the presence of noisy labels, DNNs initially learn the general patterns of the data but then gradually memorize the corrupted data, leading to overfitting.<n>By deconstructing the dynamics of gradient descent (SGD), we identify the behavior of a latent gradient bias induced by noisy labels, which harms generalization.
- Score: 2.048226951354646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Giving up and starting over may seem wasteful in many situations such as searching for a target or training deep neural networks (DNNs). Our study, though, demonstrates that resetting from a checkpoint can significantly improve generalization performance when training DNNs with noisy labels. In the presence of noisy labels, DNNs initially learn the general patterns of the data but then gradually memorize the corrupted data, leading to overfitting. By deconstructing the dynamics of stochastic gradient descent (SGD), we identify the behavior of a latent gradient bias induced by noisy labels, which harms generalization. To mitigate this negative effect, we apply the stochastic resetting method to SGD, inspired by recent developments in the field of statistical physics achieving efficient target searches. We first theoretically identify the conditions where resetting becomes beneficial, and then we empirically validate our theory, confirming the significant improvements achieved by resetting. We further demonstrate that our method is both easy to implement and compatible with other methods for handling noisy labels. Additionally, this work offers insights into the learning dynamics of DNNs from an interpretability perspective, expanding the potential to analyze training methods through the lens of statistical physics.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - ERASE: Error-Resilient Representation Learning on Graphs for Label Noise
Tolerance [53.73316938815873]
We propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE) to learn representations with error tolerance.
ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience.
Our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability.
arXiv Detail & Related papers (2023-12-13T17:59:07Z) - Dynamics-Aware Loss for Learning with Label Noise [73.75129479936302]
Label noise poses a serious threat to deep neural networks (DNNs)
We propose a dynamics-aware loss (DAL) to solve this problem.
Both the detailed theoretical analyses and extensive experimental results demonstrate the superiority of our method.
arXiv Detail & Related papers (2023-03-21T03:05:21Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - SGD with Large Step Sizes Learns Sparse Features [22.959258640051342]
We showcase important features of the dynamics of the Gradient Descent (SGD) in the training of neural networks.
We show that the longer large step sizes keep SGD high in the loss landscape, the better the implicit regularization can operate and find sparse representations.
arXiv Detail & Related papers (2022-10-11T11:00:04Z) - Towards Harnessing Feature Embedding for Robust Learning with Noisy
Labels [44.133307197696446]
The memorization effect of deep neural networks (DNNs) plays a pivotal role in recent label noise learning methods.
We propose a novel feature embedding-based method for deep learning with label noise, termed LabEl NoiseDilution (LEND)
arXiv Detail & Related papers (2022-06-27T02:45:09Z) - Robust Training under Label Noise by Over-parameterization [41.03008228953627]
We propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted.
The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data.
Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets.
arXiv Detail & Related papers (2022-02-28T18:50:10Z) - Learning to Rectify for Robust Learning with Noisy Labels [25.149277009932423]
We propose warped probabilistic inference (WarPI) to achieve adaptively rectifying the training procedure for the classification network.
We evaluate WarPI on four benchmarks of robust learning with noisy labels and achieve the new state-of-the-art under variant noise types.
arXiv Detail & Related papers (2021-11-08T02:25:50Z) - Learning from Noisy Labels via Dynamic Loss Thresholding [69.61904305229446]
We propose a novel method named Dynamic Loss Thresholding (DLT)
During the training process, DLT records the loss value of each sample and calculates dynamic loss thresholds.
Experiments on CIFAR-10/100 and Clothing1M demonstrate substantial improvements over recent state-of-the-art methods.
arXiv Detail & Related papers (2021-04-01T07:59:03Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Temporal Calibrated Regularization for Robust Noisy Label Learning [60.90967240168525]
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets.
However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality.
We propose a Temporal Calibrated Regularization (TCR) in which we utilize the original labels and the predictions in the previous epoch together.
arXiv Detail & Related papers (2020-07-01T04:48:49Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant
Disease Diagnosis [64.82680813427054]
Plant diseases serve as one of main threats to food security and crop production.
One popular approach is to transform this problem as a leaf image classification task, which can be addressed by the powerful convolutional neural networks (CNNs)
We propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information.
arXiv Detail & Related papers (2020-03-17T09:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.