Related papers: Robust and On-the-fly Dataset Denoising for Image Classification

Robust and On-the-fly Dataset Denoising for Image Classification

URL: http://arxiv.org/abs/2003.10647v2
Date: Thu, 9 Apr 2020 04:59:33 GMT
Title: Robust and On-the-fly Dataset Denoising for Image Classification
Authors: Jiaming Song, Lunjia Hu, Michael Auli, Yann Dauphin, Tengyu Ma
Abstract summary: On-the-fly Data Denoising (ODD) is robust to mislabeled examples, while introducing almost zero computational overhead compared to standard training. ODD is able to achieve state-of-the-art results on a wide range of datasets including real-world ones such as WebVision and Clothing1M.
Score: 72.10311040730815
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Memorization in over-parameterized neural networks could severely hurt generalization in the presence of mislabeled examples. However, mislabeled examples are hard to avoid in extremely large datasets collected with weak supervision. We address this problem by reasoning counterfactually about the loss distribution of examples with uniform random labels had they were trained with the real examples, and use this information to remove noisy examples from the training set. First, we observe that examples with uniform random labels have higher losses when trained with stochastic gradient descent under large learning rates. Then, we propose to model the loss distribution of the counterfactual examples using only the network parameters, which is able to model such examples with remarkable success. Finally, we propose to remove examples whose loss exceeds a certain quantile of the modeled loss distribution. This leads to On-the-fly Data Denoising (ODD), a simple yet effective algorithm that is robust to mislabeled examples, while introducing almost zero computational overhead compared to standard training. ODD is able to achieve state-of-the-art results on a wide range of datasets including real-world ones such as WebVision and Clothing1M.

Related papers

Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks. Such models tend to be large and require commensurate volumes of training data. It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs. Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z)
Regularly Truncated M-estimators for Learning with Noisy Labels [79.36560434324586]
We propose regularly truncated M-estimators (RTME) to address the above two issues simultaneously. Specifically, RTME can alternately switch modes between truncated M-estimators and original M-estimators. We demonstrate that our strategies are label-noise-tolerant.
arXiv Detail & Related papers (2023-09-02T10:22:20Z)
Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process. We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified. Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z)
Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten. We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly. SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z)
DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination [28.599571524763785]
Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance. To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful sequence.
arXiv Detail & Related papers (2022-08-21T13:38:55Z)
Learning from Data with Noisy Labels Using Temporal Self-Ensemble [11.245833546360386]
Deep neural networks (DNNs) have an enormous capacity to memorize noisy labels. Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses. We propose a simple yet effective robust training scheme that operates by training only a single network.
arXiv Detail & Related papers (2022-07-21T08:16:31Z)
Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models. Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z)
When does loss-based prioritization fail? [18.982933391138268]
We show that loss-based acceleration methods degrade in scenarios with noisy and corrupted data. Measures of example difficulty need to correctly separate out noise from other types of challenging examples.
arXiv Detail & Related papers (2021-07-16T07:23:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.