NoiseRank: Unsupervised Label Noise Reduction with Dependence Models
- URL: http://arxiv.org/abs/2003.06729v1
- Date: Sun, 15 Mar 2020 01:10:25 GMT
- Title: NoiseRank: Unsupervised Label Noise Reduction with Dependence Models
- Authors: Karishma Sharma, Pinar Donmez, Enming Luo, Yan Liu, I. Zeki Yalniz
- Abstract summary: We propose NoiseRank, for unsupervised label noise reduction using Markov Random Fields (MRF)
We construct a dependence model to estimate the posterior probability of an instance being incorrectly labeled given the dataset, and rank instances based on their estimated probabilities.
NoiseRank improves state-of-the-art classification on Food101-N (20% noise) and is effective on high noise Clothing-1M (40% noise)
- Score: 11.08987870095179
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Label noise is increasingly prevalent in datasets acquired from noisy
channels. Existing approaches that detect and remove label noise generally rely
on some form of supervision, which is not scalable and error-prone. In this
paper, we propose NoiseRank, for unsupervised label noise reduction using
Markov Random Fields (MRF). We construct a dependence model to estimate the
posterior probability of an instance being incorrectly labeled given the
dataset, and rank instances based on their estimated probabilities. Our method
1) Does not require supervision from ground-truth labels, or priors on label or
noise distribution. 2) It is interpretable by design, enabling transparency in
label noise removal. 3) It is agnostic to classifier architecture/optimization
framework and content modality. These advantages enable wide applicability in
real noise settings, unlike prior works constrained by one or more conditions.
NoiseRank improves state-of-the-art classification on Food101-N (~20% noise),
and is effective on high noise Clothing-1M (~40% noise).
Related papers
- AlleNoise: large-scale text classification benchmark dataset with real-world label noise [40.11095094521714]
We present AlleNoise, a new curated text classification benchmark dataset with real-world instance-dependent label noise.
The noise distribution comes from actual users of a major e-commerce marketplace, so it realistically reflects the semantics of human mistakes.
We demonstrate that a representative selection of established methods for learning with noisy labels is inadequate to handle such real-world noise.
arXiv Detail & Related papers (2024-06-24T09:29:14Z) - Label Noise: Correcting the Forward-Correction [0.0]
Training neural network classifiers on datasets with label noise poses a risk of overfitting them to the noisy labels.
We propose an approach to tackling overfitting caused by label noise.
Motivated by this observation, we propose imposing a lower bound on the training loss to mitigate overfitting.
arXiv Detail & Related papers (2023-07-24T19:41:19Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Robustness to Label Noise Depends on the Shape of the Noise Distribution
in Feature Space [6.748225062396441]
We show that both the scale and the shape of the noise distribution influence the posterior likelihood.
We show that when the noise distribution targets decision boundaries, classification robustness can drop off even at a small scale of noise.
arXiv Detail & Related papers (2022-06-02T15:41:59Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - Training Classifiers that are Universally Robust to All Label Noise
Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise.
We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning.
Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z) - Learning with Feature-Dependent Label Noise: A Progressive Approach [19.425199841491246]
We propose a new family of feature-dependent label noise, which is much more general than commonly used i.i.d. label noise.
We provide theoretical guarantees showing that for a wide variety of (unknown) noise patterns, a classifier trained with this strategy converges to be consistent with the Bayes classifier.
arXiv Detail & Related papers (2021-03-13T17:34:22Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - Extended T: Learning with Mixed Closed-set and Open-set Noisy Labels [86.5943044285146]
The label noise transition matrix $T$ reflects the probabilities that true labels flip into noisy ones.
In this paper, we focus on learning under the mixed closed-set and open-set label noise.
Our method can better model the mixed label noise, following its more robust performance than the prior state-of-the-art label-noise learning methods.
arXiv Detail & Related papers (2020-12-02T02:42:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.