Mitigating Memorization in Sample Selection for Learning with Noisy
Labels
- URL: http://arxiv.org/abs/2107.07041v1
- Date: Thu, 8 Jul 2021 06:44:04 GMT
- Title: Mitigating Memorization in Sample Selection for Learning with Noisy
Labels
- Authors: Kyeongbo Kong, Junggi Lee, Youngchul Kwak, Young-Rae Cho, Seong-Eun
Kim, and Woo-Jin Song
- Abstract summary: We propose a criteria to penalize dominant-noisy-labeled samples intensively through class-wise penalty labels.
Using the proposed sample selection, the learning process of the network becomes significantly robust to noisy labels.
- Score: 4.679610943608667
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Because deep learning is vulnerable to noisy labels, sample selection
techniques, which train networks with only clean labeled data, have attracted a
great attention. However, if the labels are dominantly corrupted by few
classes, these noisy samples are called dominant-noisy-labeled samples, the
network also learns dominant-noisy-labeled samples rapidly via content-aware
optimization. In this study, we propose a compelling criteria to penalize
dominant-noisy-labeled samples intensively through class-wise penalty labels.
By averaging prediction confidences for the each observed label, we obtain
suitable penalty labels that have high values if the labels are largely
corrupted by some classes. Experiments were performed using benchmarks
(CIFAR-10, CIFAR-100, Tiny-ImageNet) and real-world datasets (ANIMAL-10N,
Clothing1M) to evaluate the proposed criteria in various scenarios with
different noise rates. Using the proposed sample selection, the learning
process of the network becomes significantly robust to noisy labels compared to
existing methods in several noise types.
Related papers
- Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels [13.314778587751588]
Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching.
It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training.
We propose a framework that trains the model with new synthetic samples to mitigate the impact of noisy labels.
arXiv Detail & Related papers (2024-06-22T04:49:39Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Category-Adaptive Label Discovery and Noise Rejection for Multi-label
Image Recognition with Partial Positive Labels [78.88007892742438]
Training multi-label models with partial positive labels (MLR-PPL) attracts increasing attention.
Previous works regard unknown labels as negative and adopt traditional MLR algorithms.
We propose to explore semantic correlation among different images to facilitate the MLR-PPL task.
arXiv Detail & Related papers (2022-11-15T02:11:20Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - PARS: Pseudo-Label Aware Robust Sample Selection for Learning with Noisy
Labels [5.758073912084364]
We propose PARS: Pseudo-Label Aware Robust Sample Selection.
PARS exploits all training samples using both the raw/noisy labels and estimated/refurbished pseudo-labels via self-training.
Results show that PARS significantly outperforms the state of the art on extensive studies on the noisy CIFAR-10 and CIFAR-100 datasets.
arXiv Detail & Related papers (2022-01-26T09:31:55Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - An Ensemble Noise-Robust K-fold Cross-Validation Selection Method for
Noisy Labels [0.9699640804685629]
Large-scale datasets tend to contain mislabeled samples that can be memorized by deep neural networks (DNNs)
We present Ensemble Noise-robust K-fold Cross-Validation Selection (E-NKCVS) to effectively select clean samples from noisy data.
We evaluate our approach on various image and text classification tasks where the labels have been manually corrupted with different noise ratios.
arXiv Detail & Related papers (2021-07-06T02:14:52Z) - LongReMix: Robust Learning with High Confidence Samples in a Noisy Label
Environment [33.376639002442914]
We propose the new 2-stage noisy-label training algorithm LongReMix.
We test LongReMix on the noisy-label benchmarks CIFAR-10, CIFAR-100, WebVision, Clothing1M, and Food101-N.
Our approach achieves state-of-the-art performance in most datasets.
arXiv Detail & Related papers (2021-03-06T18:48:40Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.