An Ensemble Noise-Robust K-fold Cross-Validation Selection Method for
Noisy Labels
- URL: http://arxiv.org/abs/2107.02347v1
- Date: Tue, 6 Jul 2021 02:14:52 GMT
- Title: An Ensemble Noise-Robust K-fold Cross-Validation Selection Method for
Noisy Labels
- Authors: Yong Wen, Marcus Kalander, Chanfei Su, Lujia Pan
- Abstract summary: Large-scale datasets tend to contain mislabeled samples that can be memorized by deep neural networks (DNNs)
We present Ensemble Noise-robust K-fold Cross-Validation Selection (E-NKCVS) to effectively select clean samples from noisy data.
We evaluate our approach on various image and text classification tasks where the labels have been manually corrupted with different noise ratios.
- Score: 0.9699640804685629
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of training robust and accurate deep neural networks
(DNNs) when subject to various proportions of noisy labels. Large-scale
datasets tend to contain mislabeled samples that can be memorized by DNNs,
impeding the performance. With appropriate handling, this degradation can be
alleviated. There are two problems to consider: how to distinguish clean
samples and how to deal with noisy samples. In this paper, we present Ensemble
Noise-robust K-fold Cross-Validation Selection (E-NKCVS) to effectively select
clean samples from noisy data, solving the first problem. For the second
problem, we create a new pseudo label for any sample determined to have an
uncertain or likely corrupt label. E-NKCVS obtains multiple predicted labels
for each sample and the entropy of these labels is used to tune the weight
given to the pseudo label and the given label. Theoretical analysis and
extensive verification of the algorithms in the noisy label setting are
provided. We evaluate our approach on various image and text classification
tasks where the labels have been manually corrupted with different noise
ratios. Additionally, two large real-world noisy datasets are also used,
Clothing-1M and WebVision. E-NKCVS is empirically shown to be highly tolerant
to considerable proportions of label noise and has a consistent improvement
over state-of-the-art methods. Especially on more difficult datasets with
higher noise ratios, we can achieve a significant improvement over the
second-best model. Moreover, our proposed approach can easily be integrated
into existing DNN methods to improve their robustness against label noise.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Is your noise correction noisy? PLS: Robustness to label noise with two
stage detection [16.65296285599679]
This paper proposes to improve the correction accuracy of noisy samples once they have been detected.
In many state-of-the-art contributions, a two phase approach is adopted where the noisy samples are detected before guessing a corrected pseudo-label.
We propose the pseudo-loss, a simple metric that we find to be strongly correlated with pseudo-label correctness on noisy samples.
arXiv Detail & Related papers (2022-10-10T11:32:28Z) - Learning from Noisy Labels with Coarse-to-Fine Sample Credibility
Modeling [22.62790706276081]
Training deep neural network (DNN) with noisy labels is practically challenging.
Previous efforts tend to handle part or full data in a unified denoising flow.
We propose a coarse-to-fine robust learning method called CREMA to handle noisy data in a divide-and-conquer manner.
arXiv Detail & Related papers (2022-08-23T02:06:38Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Sample Prior Guided Robust Model Learning to Suppress Noisy Labels [8.119439844514973]
We propose PGDF, a novel framework to learn a deep model to suppress noise by generating the samples' prior knowledge.
Our framework can save more informative hard clean samples into the cleanly labeled set.
We evaluate our method using synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the real-world datasets WebVision and Clothing1M.
arXiv Detail & Related papers (2021-12-02T13:09:12Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Boosting Semi-Supervised Face Recognition with Noise Robustness [54.342992887966616]
This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.
We develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN.
arXiv Detail & Related papers (2021-05-10T14:43:11Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.