Label Noise Types and Their Effects on Deep Learning
- URL: http://arxiv.org/abs/2003.10471v1
- Date: Mon, 23 Mar 2020 18:03:39 GMT
- Title: Label Noise Types and Their Effects on Deep Learning
- Authors: G\"orkem Algan, \.Ilkay Ulusoy
- Abstract summary: In this work, we provide a detailed analysis of the effects of different kinds of label noise on learning.
We propose a generic framework to generate feature-dependent label noise, which we show to be the most challenging case for learning.
For the ease of other researchers to test their algorithms with noisy labels, we share corrupted labels for the most commonly used benchmark datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent success of deep learning is mostly due to the availability of big
datasets with clean annotations. However, gathering a cleanly annotated dataset
is not always feasible due to practical challenges. As a result, label noise is
a common problem in datasets, and numerous methods to train deep neural
networks in the presence of noisy labels are proposed in the literature. These
methods commonly use benchmark datasets with synthetic label noise on the
training set. However, there are multiple types of label noise, and each of
them has its own characteristic impact on learning. Since each work generates a
different kind of label noise, it is problematic to test and compare those
algorithms in the literature fairly. In this work, we provide a detailed
analysis of the effects of different kinds of label noise on learning.
Moreover, we propose a generic framework to generate feature-dependent label
noise, which we show to be the most challenging case for learning. Our proposed
method aims to emphasize similarities among data instances by sparsely
distributing them in the feature domain. By this approach, samples that are
more likely to be mislabeled are detected from their softmax probabilities, and
their labels are flipped to the corresponding class. The proposed method can be
applied to any clean dataset to synthesize feature-dependent noisy labels. For
the ease of other researchers to test their algorithms with noisy labels, we
share corrupted labels for the most commonly used benchmark datasets. Our code
and generated noisy synthetic labels are available online.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Generating the Ground Truth: Synthetic Data for Soft Label and Label Noise Research [0.0]
We introduce SYNLABEL, a framework designed to create noiseless datasets informed by real-world data.
We demonstrate its ability to precisely quantify label noise and its improvement over existing methodologies.
arXiv Detail & Related papers (2023-09-08T13:31:06Z) - NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in
Natural Language Processing [26.678589684142548]
Large-scale datasets in the real world inevitably involve label noise.
Deep models can gradually overfit noisy labels and thus degrade generalization performance.
To mitigate the effects of label noise, learning with noisy labels (LNL) methods are designed to achieve better generalization performance.
arXiv Detail & Related papers (2023-05-18T05:01:04Z) - Rethinking the Value of Labels for Instance-Dependent Label Noise
Learning [43.481591776038144]
noisy labels in real-world applications often depend on both the true label and the features.
In this work, we tackle instance-dependent label noise with a novel deep generative model that avoids explicitly modeling the noise transition matrix.
Our algorithm leverages casual representation learning and simultaneously identifies the high-level content and style latent factors from the data.
arXiv Detail & Related papers (2023-05-10T15:29:07Z) - Tripartite: Tackle Noisy Labels by a More Precise Partition [21.582850128741022]
We propose a Tripartite solution to partition training data more precisely into three subsets: hard, noisy, and clean.
To minimize the harm of noisy labels but maximize the value of noisy label data, we apply a low-weight learning on hard data and a self-supervised learning on noisy label data without using the given labels.
arXiv Detail & Related papers (2022-02-19T11:15:02Z) - Learning to Aggregate and Refine Noisy Labels for Visual Sentiment
Analysis [69.48582264712854]
We propose a robust learning method to perform robust visual sentiment analysis.
Our method relies on an external memory to aggregate and filter noisy labels during training.
We establish a benchmark for visual sentiment analysis with label noise using publicly available datasets.
arXiv Detail & Related papers (2021-09-15T18:18:28Z) - Robust Long-Tailed Learning under Label Noise [50.00837134041317]
This work investigates the label noise problem under long-tailed label distribution.
We propose a robust framework,algo, that realizes noise detection for long-tailed learning.
Our framework can naturally leverage semi-supervised learning algorithms to further improve the generalisation.
arXiv Detail & Related papers (2021-08-26T03:45:00Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Noisy Labels Can Induce Good Representations [53.47668632785373]
We study how architecture affects learning with noisy labels.
We show that training with noisy labels can induce useful hidden representations, even when the model generalizes poorly.
This finding leads to a simple method to improve models trained on noisy labels.
arXiv Detail & Related papers (2020-12-23T18:58:05Z) - EvidentialMix: Learning with Combined Open-set and Closed-set Noisy
Labels [30.268962418683955]
We study a new variant of the noisy label problem that combines the open-set and closed-set noisy labels.
Our results show that our method produces superior classification results and better feature representations than previous state-of-the-art methods.
arXiv Detail & Related papers (2020-11-11T11:15:32Z) - Multi-Class Classification from Noisy-Similarity-Labeled Data [98.13491369929798]
We propose a method for learning from only noisy-similarity-labeled data.
We use a noise transition matrix to bridge the class-posterior probability between clean and noisy data.
We build a novel learning system which can assign noise-free class labels for instances.
arXiv Detail & Related papers (2020-02-16T05:10:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.