Robust and On-the-fly Dataset Denoising for Image Classification
- URL: http://arxiv.org/abs/2003.10647v2
- Date: Thu, 9 Apr 2020 04:59:33 GMT
- Title: Robust and On-the-fly Dataset Denoising for Image Classification
- Authors: Jiaming Song, Lunjia Hu, Michael Auli, Yann Dauphin, Tengyu Ma
- Abstract summary: On-the-fly Data Denoising (ODD) is robust to mislabeled examples, while introducing almost zero computational overhead compared to standard training.
ODD is able to achieve state-of-the-art results on a wide range of datasets including real-world ones such as WebVision and Clothing1M.
- Score: 72.10311040730815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Memorization in over-parameterized neural networks could severely hurt
generalization in the presence of mislabeled examples. However, mislabeled
examples are hard to avoid in extremely large datasets collected with weak
supervision. We address this problem by reasoning counterfactually about the
loss distribution of examples with uniform random labels had they were trained
with the real examples, and use this information to remove noisy examples from
the training set. First, we observe that examples with uniform random labels
have higher losses when trained with stochastic gradient descent under large
learning rates. Then, we propose to model the loss distribution of the
counterfactual examples using only the network parameters, which is able to
model such examples with remarkable success. Finally, we propose to remove
examples whose loss exceeds a certain quantile of the modeled loss
distribution. This leads to On-the-fly Data Denoising (ODD), a simple yet
effective algorithm that is robust to mislabeled examples, while introducing
almost zero computational overhead compared to standard training. ODD is able
to achieve state-of-the-art results on a wide range of datasets including
real-world ones such as WebVision and Clothing1M.
Related papers
- Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Regularly Truncated M-estimators for Learning with Noisy Labels [79.36560434324586]
We propose regularly truncated M-estimators (RTME) to address the above two issues simultaneously.
Specifically, RTME can alternately switch modes between truncated M-estimators and original M-estimators.
We demonstrate that our strategies are label-noise-tolerant.
arXiv Detail & Related papers (2023-09-02T10:22:20Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten.
We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly.
SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z) - DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples
Discrimination [28.599571524763785]
Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance.
To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful sequence.
arXiv Detail & Related papers (2022-08-21T13:38:55Z) - Learning from Data with Noisy Labels Using Temporal Self-Ensemble [11.245833546360386]
Deep neural networks (DNNs) have an enormous capacity to memorize noisy labels.
Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses.
We propose a simple yet effective robust training scheme that operates by training only a single network.
arXiv Detail & Related papers (2022-07-21T08:16:31Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - When does loss-based prioritization fail? [18.982933391138268]
We show that loss-based acceleration methods degrade in scenarios with noisy and corrupted data.
Measures of example difficulty need to correctly separate out noise from other types of challenging examples.
arXiv Detail & Related papers (2021-07-16T07:23:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.