Learning from Noisy Labels via Dynamic Loss Thresholding
- URL: http://arxiv.org/abs/2104.02570v1
- Date: Thu, 1 Apr 2021 07:59:03 GMT
- Title: Learning from Noisy Labels via Dynamic Loss Thresholding
- Authors: Hao Yang, Youzhi Jin, Ziyin Li, Deng-Bao Wang, Lei Miao, Xin Geng,
Min-Ling Zhang
- Abstract summary: We propose a novel method named Dynamic Loss Thresholding (DLT)
During the training process, DLT records the loss value of each sample and calculates dynamic loss thresholds.
Experiments on CIFAR-10/100 and Clothing1M demonstrate substantial improvements over recent state-of-the-art methods.
- Score: 69.61904305229446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Numerous researches have proved that deep neural networks (DNNs) can fit
everything in the end even given data with noisy labels, and result in poor
generalization performance. However, recent studies suggest that DNNs tend to
gradually memorize the data, moving from correct data to mislabeled data.
Inspired by this finding, we propose a novel method named Dynamic Loss
Thresholding (DLT). During the training process, DLT records the loss value of
each sample and calculates dynamic loss thresholds. Specifically, DLT compares
the loss value of each sample with the current loss threshold. Samples with
smaller losses can be considered as clean samples with higher probability and
vice versa. Then, DLT discards the potentially corrupted labels and further
leverages supervised learning techniques. Experiments on CIFAR-10/100 and
Clothing1M demonstrate substantial improvements over recent state-of-the-art
methods.
In addition, we investigate two real-world problems for the first time.
Firstly, we propose a novel approach to estimate the noise rates of datasets
based on the loss difference between the early and late training stages of
DNNs. Secondly, we explore the effect of hard samples (which are difficult to
be distinguished) on the process of learning from noisy labels.
Related papers
- On the Benefits of Fine-Grained Loss Truncation: A Case Study on
Factuality in Summarization [25.282499952331094]
Loss Truncation (LT) is an approach to modify the standard log loss to adaptively remove noisy examples during training.
We show that LT alone yields a considerable number of hallucinated entities on various datasets.
We propose a fine-grained NLL loss and fine-grained data cleaning strategies, and observe improvements in hallucination reduction across some datasets.
arXiv Detail & Related papers (2024-03-09T04:20:26Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - Dynamics-Aware Loss for Learning with Label Noise [73.75129479936302]
Label noise poses a serious threat to deep neural networks (DNNs)
We propose a dynamics-aware loss (DAL) to solve this problem.
Both the detailed theoretical analyses and extensive experimental results demonstrate the superiority of our method.
arXiv Detail & Related papers (2023-03-21T03:05:21Z) - Robust Meta-learning with Sampling Noise and Label Noise via
Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples.
When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise.
We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z) - Synergistic Network Learning and Label Correction for Noise-robust Image
Classification [28.27739181560233]
Deep Neural Networks (DNNs) tend to overfit training label noise, resulting in poorer model performance in practice.
We propose a robust label correction framework combining the ideas of small loss selection and noise correction.
We demonstrate our method on both synthetic and real-world datasets with different noise types and rates.
arXiv Detail & Related papers (2022-02-27T23:06:31Z) - Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data [17.7825114228313]
Corrupted labels and class imbalance are commonly encountered in practically collected training data.
Existing approaches alleviate these issues by adopting a sample re-weighting strategy.
However, biased samples with corrupted labels and of tailed classes commonly co-exist in training data.
arXiv Detail & Related papers (2021-12-30T09:20:07Z) - Sample Selection with Uncertainty of Losses for Learning with Noisy
Labels [145.06552420999986]
In learning with noisy labels, the sample selection approach is very popular, which regards small-loss data as correctly labeled during training.
However, losses are generated on-the-fly based on the model being trained with noisy labels, and thus large-loss data are likely but not certainly to be incorrect.
In this paper, we incorporate the uncertainty of losses by adopting interval estimation instead of point estimation of losses.
arXiv Detail & Related papers (2021-06-01T12:53:53Z) - Identifying Training Stop Point with Noisy Labeled Data [0.0]
We develop an algorithm to find a training stop point (TSP) at or close to test accuracy (MOTA)
We validated the robustness of our algorithm (AutoTSP) through several experiments on CIFAR-10, CIFAR-100, and a real-world noisy dataset.
arXiv Detail & Related papers (2020-12-24T20:07:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.