A Good Representation Detects Noisy Labels
- URL: http://arxiv.org/abs/2110.06283v1
- Date: Tue, 12 Oct 2021 19:10:30 GMT
- Title: A Good Representation Detects Noisy Labels
- Authors: Zhaowei Zhu, Zihao Dong, Hao Cheng, Yang Liu
- Abstract summary: Label noise is pervasive in real-world datasets, which encodes wrong correlation patterns and impairs the generalization of deep neural networks (DNNs)
We propose a universally applicable and trainingfree solution to detect noisy labels.
Experiments with both synthetic and real-world label noise demonstrate our training-free solutions are significantly improving over most of the training-based datasets.
- Score: 9.4092903583089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Label noise is pervasive in real-world datasets, which encodes wrong
correlation patterns and impairs the generalization of deep neural networks
(DNNs). It is critical to find efficient ways to detect the corrupted patterns.
Current methods primarily focus on designing robust training techniques to
prevent DNNs from memorizing corrupted patterns. This approach has two
outstanding caveats: 1) applying this approach to each individual dataset would
often require customized training processes; 2) as long as the model is trained
with noisy supervisions, overfitting to corrupted patterns is often hard to
avoid, leading to performance drop in detection. In this paper, given good
representations, we propose a universally applicable and training-free solution
to detect noisy labels. Intuitively, good representations help define
``neighbors'' of each training instance, and closer instances are more likely
to share the same clean label. Based on the neighborhood information, we
propose two methods: the first one uses ``local voting" via checking the noisy
label consensuses of nearby representations. The second one is a ranking-based
approach that scores each instance and filters out a guaranteed number of
instances that are likely to be corrupted, again using only representations.
Given good (but possibly imperfect) representations that are commonly available
in practice, we theoretically analyze how they affect the local voting and
provide guidelines for tuning neighborhood size. We also prove the worst-case
error bound for the ranking-based method. Experiments with both synthetic and
real-world label noise demonstrate our training-free solutions are consistently
and significantly improving over most of the training-based baselines. Code is
available at github.com/UCSC-REAL/SimiRep.
Related papers
- ROG$_{PL}$: Robust Open-Set Graph Learning via Region-Based Prototype
Learning [52.60434474638983]
We propose a unified framework named ROG$_PL$ to achieve robust open-set learning on complex noisy graph data.
The framework consists of two modules, i.e., denoising via label propagation and open-set prototype learning via regions.
To the best of our knowledge, the proposed ROG$_PL$ is the first robust open-set node classification method for graph data with complex noise.
arXiv Detail & Related papers (2024-02-28T17:25:06Z) - Is your noise correction noisy? PLS: Robustness to label noise with two
stage detection [16.65296285599679]
This paper proposes to improve the correction accuracy of noisy samples once they have been detected.
In many state-of-the-art contributions, a two phase approach is adopted where the noisy samples are detected before guessing a corrected pseudo-label.
We propose the pseudo-loss, a simple metric that we find to be strongly correlated with pseudo-label correctness on noisy samples.
arXiv Detail & Related papers (2022-10-10T11:32:28Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Context-based Virtual Adversarial Training for Text Classification with
Noisy Labels [1.9508698179748525]
We propose context-based virtual adversarial training (ConVAT) to prevent a text classifier from overfitting to noisy labels.
Unlike the previous works, the proposed method performs the adversarial training at the context level rather than the inputs.
We conduct extensive experiments on four text classification datasets with two types of label noises.
arXiv Detail & Related papers (2022-05-29T14:19:49Z) - Synergistic Network Learning and Label Correction for Noise-robust Image
Classification [28.27739181560233]
Deep Neural Networks (DNNs) tend to overfit training label noise, resulting in poorer model performance in practice.
We propose a robust label correction framework combining the ideas of small loss selection and noise correction.
We demonstrate our method on both synthetic and real-world datasets with different noise types and rates.
arXiv Detail & Related papers (2022-02-27T23:06:31Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Robust Long-Tailed Learning under Label Noise [50.00837134041317]
This work investigates the label noise problem under long-tailed label distribution.
We propose a robust framework,algo, that realizes noise detection for long-tailed learning.
Our framework can naturally leverage semi-supervised learning algorithms to further improve the generalisation.
arXiv Detail & Related papers (2021-08-26T03:45:00Z) - Unified Robust Training for Graph NeuralNetworks against Label Noise [12.014301020294154]
We propose a new framework, UnionNET, for learning with noisy labels on graphs under a semi-supervised setting.
Our approach provides a unified solution for robustly training GNNs and performing label correction simultaneously.
arXiv Detail & Related papers (2021-03-05T01:17:04Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Noisy Labels Can Induce Good Representations [53.47668632785373]
We study how architecture affects learning with noisy labels.
We show that training with noisy labels can induce useful hidden representations, even when the model generalizes poorly.
This finding leads to a simple method to improve models trained on noisy labels.
arXiv Detail & Related papers (2020-12-23T18:58:05Z) - EvidentialMix: Learning with Combined Open-set and Closed-set Noisy
Labels [30.268962418683955]
We study a new variant of the noisy label problem that combines the open-set and closed-set noisy labels.
Our results show that our method produces superior classification results and better feature representations than previous state-of-the-art methods.
arXiv Detail & Related papers (2020-11-11T11:15:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.