Two Wrongs Don't Make a Right: Combating Confirmation Bias in Learning
with Label Noise
- URL: http://arxiv.org/abs/2112.02960v1
- Date: Mon, 6 Dec 2021 12:10:17 GMT
- Title: Two Wrongs Don't Make a Right: Combating Confirmation Bias in Learning
with Label Noise
- Authors: Mingcai Chen, Hao Cheng, Yuntao Du, Ming Xu, Wenyu Jiang, Chongjun
Wang
- Abstract summary: Robust Label Refurbishment (Robust LR) is a new hybrid method that integrates pseudo-labeling and confidence estimation techniques to refurbish noisy labels.
We show that our method successfully alleviates the damage of both label noise and confirmation bias.
For example, Robust LR achieves up to 4.5% absolute top-1 accuracy improvement over the previous best on the real-world noisy dataset WebVision.
- Score: 6.303101074386922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Noisy labels damage the performance of deep networks. For robust learning, a
prominent two-stage pipeline alternates between eliminating possible incorrect
labels and semi-supervised training. However, discarding part of observed
labels could result in a loss of information, especially when the corruption is
not completely random, e.g., class-dependent or instance-dependent. Moreover,
from the training dynamics of a representative two-stage method DivideMix, we
identify the domination of confirmation bias: Pseudo-labels fail to correct a
considerable amount of noisy labels and consequently, the errors accumulate. To
sufficiently exploit information from observed labels and mitigate wrong
corrections, we propose Robust Label Refurbishment (Robust LR)-a new hybrid
method that integrates pseudo-labeling and confidence estimation techniques to
refurbish noisy labels. We show that our method successfully alleviates the
damage of both label noise and confirmation bias. As a result, it achieves
state-of-the-art results across datasets and noise types. For example, Robust
LR achieves up to 4.5% absolute top-1 accuracy improvement over the previous
best on the real-world noisy dataset WebVision.
Related papers
- Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech
Recognition [49.42732949233184]
When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition.
Taking noisy labels as ground-truth in the loss function results in suboptimal performance.
We propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels.
arXiv Detail & Related papers (2023-08-12T12:13:52Z) - Pseudo-Label Noise Suppression Techniques for Semi-Supervised Semantic
Segmentation [21.163070161951868]
Semi-consuming learning (SSL) can reduce the need for large labelled datasets by incorporating unsupervised data into the training.
Current SSL approaches use an initially supervised trained model to generate predictions for unlabelled images, called pseudo-labels.
We use three mechanisms to control pseudo-label noise and errors.
arXiv Detail & Related papers (2022-10-19T09:46:27Z) - Is your noise correction noisy? PLS: Robustness to label noise with two
stage detection [16.65296285599679]
This paper proposes to improve the correction accuracy of noisy samples once they have been detected.
In many state-of-the-art contributions, a two phase approach is adopted where the noisy samples are detected before guessing a corrected pseudo-label.
We propose the pseudo-loss, a simple metric that we find to be strongly correlated with pseudo-label correctness on noisy samples.
arXiv Detail & Related papers (2022-10-10T11:32:28Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Label Noise in Adversarial Training: A Novel Perspective to Study Robust
Overfitting [45.58217741522973]
We show that label noise exists in adversarial training.
Such label noise is due to the mismatch between the true label distribution of adversarial examples and the label inherited from clean examples.
We propose a method to automatically calibrate the label to address the label noise and robust overfitting.
arXiv Detail & Related papers (2021-10-07T01:15:06Z) - An Ensemble Noise-Robust K-fold Cross-Validation Selection Method for
Noisy Labels [0.9699640804685629]
Large-scale datasets tend to contain mislabeled samples that can be memorized by deep neural networks (DNNs)
We present Ensemble Noise-robust K-fold Cross-Validation Selection (E-NKCVS) to effectively select clean samples from noisy data.
We evaluate our approach on various image and text classification tasks where the labels have been manually corrupted with different noise ratios.
arXiv Detail & Related papers (2021-07-06T02:14:52Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Boosting Semi-Supervised Face Recognition with Noise Robustness [54.342992887966616]
This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.
We develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN.
arXiv Detail & Related papers (2021-05-10T14:43:11Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - Error-Bounded Correction of Noisy Labels [17.510654621245656]
We show that the prediction of a noisy classifier can indeed be a good indicator of whether the label of a training data is clean.
Based on the theoretical result, we propose a novel algorithm that corrects the labels based on the noisy classifier prediction.
We incorporate our label correction algorithm into the training of deep neural networks and train models that achieve superior testing performance on multiple public datasets.
arXiv Detail & Related papers (2020-11-19T19:23:23Z) - Learning to Purify Noisy Labels via Meta Soft Label Corrector [49.92310583232323]
Recent deep neural networks (DNNs) can easily overfit to biased training data with noisy labels.
Label correction strategy is commonly used to alleviate this issue.
We propose a meta-learning model which could estimate soft labels through meta-gradient descent step.
arXiv Detail & Related papers (2020-08-03T03:25:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.