A Realistic Simulation Framework for Learning with Label Noise
- URL: http://arxiv.org/abs/2107.11413v1
- Date: Fri, 23 Jul 2021 18:53:53 GMT
- Title: A Realistic Simulation Framework for Learning with Label Noise
- Authors: Keren Gu, Xander Masotto, Vandana Bachani, Balaji Lakshminarayanan,
Jack Nikodem, Dong Yin
- Abstract summary: We show that this framework generates synthetic noisy labels that exhibit important characteristics of the label noise.
We also benchmark several existing algorithms for learning with noisy labels.
We propose a new technique, Label Quality Model (LQM), that leverages annotator features to predict and correct against noisy labels.
- Score: 17.14439597393087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a simulation framework for generating realistic instance-dependent
noisy labels via a pseudo-labeling paradigm. We show that this framework
generates synthetic noisy labels that exhibit important characteristics of the
label noise in practical settings via comparison with the CIFAR10-H dataset.
Equipped with controllable label noise, we study the negative impact of noisy
labels across a few realistic settings to understand when label noise is more
problematic. We also benchmark several existing algorithms for learning with
noisy labels and compare their behavior on our synthetic datasets and on the
datasets with independent random label noise. Additionally, with the
availability of annotator information from our simulation framework, we propose
a new technique, Label Quality Model (LQM), that leverages annotator features
to predict and correct against noisy labels. We show that by adding LQM as a
label correction step before applying existing noisy label techniques, we can
further improve the models' performance.
Related papers
- NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification [7.464154519547575]
Existing research on learning with noisy labels predominantly focuses on synthetic noise patterns.
We constructed a benchmark dataset to better understand label noise in real-world text classification settings.
Our findings reveal that while pre-trained models are resilient to synthetic noise, they struggle against instance-dependent noise.
arXiv Detail & Related papers (2024-07-09T06:18:40Z) - Generating the Ground Truth: Synthetic Data for Soft Label and Label Noise Research [0.0]
We introduce SYNLABEL, a framework designed to create noiseless datasets informed by real-world data.
We demonstrate its ability to precisely quantify label noise and its improvement over existing methodologies.
arXiv Detail & Related papers (2023-09-08T13:31:06Z) - Rethinking the Value of Labels for Instance-Dependent Label Noise
Learning [43.481591776038144]
noisy labels in real-world applications often depend on both the true label and the features.
In this work, we tackle instance-dependent label noise with a novel deep generative model that avoids explicitly modeling the noise transition matrix.
Our algorithm leverages casual representation learning and simultaneously identifies the high-level content and style latent factors from the data.
arXiv Detail & Related papers (2023-05-10T15:29:07Z) - Robust Meta-learning with Sampling Noise and Label Noise via
Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples.
When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise.
We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - Instance-dependent Label-noise Learning under a Structural Causal Model [92.76400590283448]
Label noise will degenerate the performance of deep learning algorithms.
By leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning.
arXiv Detail & Related papers (2021-09-07T10:42:54Z) - Rethinking Noisy Label Models: Labeler-Dependent Noise with Adversarial
Awareness [2.1930130356902207]
We propose a principled model of label noise that generalizes instance-dependent noise to multiple labelers.
Under our labeler-dependent model, label noise manifests itself under two modalities: natural error of good-faith labelers, and adversarial labels provided by malicious actors.
We present two adversarial attack vectors that more accurately reflect the label noise that may be encountered in real-world settings.
arXiv Detail & Related papers (2021-05-28T19:58:18Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Extended T: Learning with Mixed Closed-set and Open-set Noisy Labels [86.5943044285146]
The label noise transition matrix $T$ reflects the probabilities that true labels flip into noisy ones.
In this paper, we focus on learning under the mixed closed-set and open-set label noise.
Our method can better model the mixed label noise, following its more robust performance than the prior state-of-the-art label-noise learning methods.
arXiv Detail & Related papers (2020-12-02T02:42:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.