Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations
- URL: http://arxiv.org/abs/2110.12088v1
- Date: Fri, 22 Oct 2021 22:42:11 GMT
- Title: Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations
- Authors: Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang
Liu
- Abstract summary: Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
- Score: 54.400167806154535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing research on learning with noisy labels mainly focuses on synthetic
label noise. Synthetic label noise, though has clean structures which greatly
enable statistical analyses, often fails to model the real-world noise
patterns. The recent literature has observed several efforts to offer
real-world noisy datasets, yet the existing efforts suffer from two caveats:
firstly, the lack of ground-truth verification makes it hard to theoretically
study the property and treatment of real-world label noise. Secondly, these
efforts are often of large scales, which may lead to unfair comparisons of
robust methods within reasonable and accessible computation power. To better
understand real-world label noise, it is important to establish controllable
and moderate-sized real-world noisy datasets with both ground-truth and noisy
labels. This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N),
equipping the train dataset of CIFAR-10 and CIFAR-100 with human-annotated
real-world noisy labels that we collect from Amazon Mechanical Turk. We
quantitatively and qualitatively show that real-world noisy labels follow an
instance-dependent pattern rather than the classically adopted class-dependent
ones. We then initiate an effort to benchmark a subset of existing solutions
using CIFAR-10N, CIFAR-100N. We next proceed to study the memorization of model
predictions, which further illustrates the difference between human noise and
class-dependent synthetic noise. We show indeed the real-world noise patterns
impose new and outstanding challenges as compared to synthetic ones. These
observations require us to rethink the treatment of noisy labels, and we hope
the availability of these two datasets would facilitate the development and
evaluation of future learning with noisy label solutions. The corresponding
datasets and the leaderboard are publicly available at
\url{http://noisylabels.com}.
Related papers
- NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification [7.464154519547575]
Existing research on learning with noisy labels predominantly focuses on synthetic noise patterns.
We constructed a benchmark dataset to better understand label noise in real-world text classification settings.
Our findings reveal that while pre-trained models are resilient to synthetic noise, they struggle against instance-dependent noise.
arXiv Detail & Related papers (2024-07-09T06:18:40Z) - Noisy Label Processing for Classification: A Survey [2.8821062918162146]
In the long, tedious process of data annotation, annotators are prone to make mistakes, resulting in incorrect labels of images.
It is crucial to combat noisy labels for computer vision tasks, especially for classification tasks.
We propose an algorithm to generate a synthetic label noise pattern guided by real-world data.
arXiv Detail & Related papers (2024-04-05T15:11:09Z) - Group Benefits Instances Selection for Data Purification [21.977432359384835]
Existing methods for combating label noise are typically designed and tested on synthetic datasets.
We propose a method named GRIP to alleviate the noisy label problem for both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-03-23T03:06:19Z) - NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in
Natural Language Processing [26.678589684142548]
Large-scale datasets in the real world inevitably involve label noise.
Deep models can gradually overfit noisy labels and thus degrade generalization performance.
To mitigate the effects of label noise, learning with noisy labels (LNL) methods are designed to achieve better generalization performance.
arXiv Detail & Related papers (2023-05-18T05:01:04Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Training Classifiers that are Universally Robust to All Label Noise
Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise.
We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning.
Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.