PARS: Pseudo-Label Aware Robust Sample Selection for Learning with Noisy
Labels
- URL: http://arxiv.org/abs/2201.10836v1
- Date: Wed, 26 Jan 2022 09:31:55 GMT
- Title: PARS: Pseudo-Label Aware Robust Sample Selection for Learning with Noisy
Labels
- Authors: Arushi Goel, Yunlong Jiao and Jordan Massiah
- Abstract summary: We propose PARS: Pseudo-Label Aware Robust Sample Selection.
PARS exploits all training samples using both the raw/noisy labels and estimated/refurbished pseudo-labels via self-training.
Results show that PARS significantly outperforms the state of the art on extensive studies on the noisy CIFAR-10 and CIFAR-100 datasets.
- Score: 5.758073912084364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Acquiring accurate labels on large-scale datasets is both time consuming and
expensive. To reduce the dependency of deep learning models on learning from
clean labeled data, several recent research efforts are focused on learning
with noisy labels. These methods typically fall into three design categories to
learn a noise robust model: sample selection approaches, noise robust loss
functions, or label correction methods. In this paper, we propose PARS:
Pseudo-Label Aware Robust Sample Selection, a hybrid approach that combines the
best from all three worlds in a joint-training framework to achieve robustness
to noisy labels. Specifically, PARS exploits all training samples using both
the raw/noisy labels and estimated/refurbished pseudo-labels via self-training,
divides samples into an ambiguous and a noisy subset via loss analysis, and
designs label-dependent noise-aware loss functions for both sets of filtered
labels. Results show that PARS significantly outperforms the state of the art
on extensive studies on the noisy CIFAR-10 and CIFAR-100 datasets, particularly
on challenging high-noise and low-resource settings. In particular, PARS
achieved an absolute 12% improvement in test accuracy on the CIFAR-100 dataset
with 90% symmetric label noise, and an absolute 27% improvement in test
accuracy when only 1/5 of the noisy labels are available during training as an
additional restriction. On a real-world noisy dataset, Clothing1M, PARS
achieves competitive results to the state of the art.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Group Benefits Instances Selection for Data Purification [21.977432359384835]
Existing methods for combating label noise are typically designed and tested on synthetic datasets.
We propose a method named GRIP to alleviate the noisy label problem for both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-03-23T03:06:19Z) - Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning [8.387189407144403]
Partial label learning (PLL) is a weakly-supervised learning paradigm where each training instance is paired with a set of candidate labels (partial label)
NPLL relaxes this constraint by allowing some partial labels to not contain the true label, enhancing the practicality of the problem.
We present a minimalistic framework that initially assigns pseudo-labels to images by exploiting the noisy partial labels through a weighted nearest neighbour algorithm.
arXiv Detail & Related papers (2024-02-07T13:32:47Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Reliable Label Correction is a Good Booster When Learning with Extremely
Noisy Labels [65.79898033530408]
We introduce a novel framework, termed as LC-Booster, to explicitly tackle learning under extreme noise.
LC-Booster incorporates label correction into the sample selection, so that more purified samples, through the reliable label correction, can be utilized for training.
Experiments show that LC-Booster advances state-of-the-art results on several noisy-label benchmarks.
arXiv Detail & Related papers (2022-04-30T07:19:03Z) - UNICON: Combating Label Noise Through Uniform Selection and Contrastive
Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise.
We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z) - Sample Prior Guided Robust Model Learning to Suppress Noisy Labels [8.119439844514973]
We propose PGDF, a novel framework to learn a deep model to suppress noise by generating the samples' prior knowledge.
Our framework can save more informative hard clean samples into the cleanly labeled set.
We evaluate our method using synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the real-world datasets WebVision and Clothing1M.
arXiv Detail & Related papers (2021-12-02T13:09:12Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - An Ensemble Noise-Robust K-fold Cross-Validation Selection Method for
Noisy Labels [0.9699640804685629]
Large-scale datasets tend to contain mislabeled samples that can be memorized by deep neural networks (DNNs)
We present Ensemble Noise-robust K-fold Cross-Validation Selection (E-NKCVS) to effectively select clean samples from noisy data.
We evaluate our approach on various image and text classification tasks where the labels have been manually corrupted with different noise ratios.
arXiv Detail & Related papers (2021-07-06T02:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.