Class Prototype-based Cleaner for Label Noise Learning
- URL: http://arxiv.org/abs/2212.10766v1
- Date: Wed, 21 Dec 2022 04:56:41 GMT
- Title: Class Prototype-based Cleaner for Label Noise Learning
- Authors: Jingjia Huang, Yuanqi Chen, Jiashi Feng, Xinglong Wu
- Abstract summary: Semi-supervised learning methods are current SOTA solutions to the noisy-label learning problem.
We propose a simple yet effective solution, named textbfClass textbfPrototype-based label noise textbfCleaner.
- Score: 73.007001454085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised learning based methods are current SOTA solutions to the
noisy-label learning problem, which rely on learning an unsupervised label
cleaner first to divide the training samples into a labeled set for clean data
and an unlabeled set for noise data. Typically, the cleaner is obtained via
fitting a mixture model to the distribution of per-sample training losses.
However, the modeling procedure is \emph{class agnostic} and assumes the loss
distributions of clean and noise samples are the same across different classes.
Unfortunately, in practice, such an assumption does not always hold due to the
varying learning difficulty of different classes, thus leading to sub-optimal
label noise partition criteria. In this work, we reveal this long-ignored
problem and propose a simple yet effective solution, named \textbf{C}lass
\textbf{P}rototype-based label noise \textbf{C}leaner (\textbf{CPC}). Unlike
previous works treating all the classes equally, CPC fully considers loss
distribution heterogeneity and applies class-aware modulation to partition the
clean and noise data. CPC takes advantage of loss distribution modeling and
intra-class consistency regularization in feature space simultaneously and thus
can better distinguish clean and noise labels. We theoretically justify the
effectiveness of our method by explaining it from the Expectation-Maximization
(EM) framework. Extensive experiments are conducted on the noisy-label
benchmarks CIFAR-10, CIFAR-100, Clothing1M and WebVision. The results show that
CPC consistently brings about performance improvement across all benchmarks.
Codes and pre-trained models will be released at
\url{https://github.com/hjjpku/CPC.git}.
Related papers
- CLIPCleaner: Cleaning Noisy Labels with CLIP [36.434849361479316]
textitCLIPCleaner is a zero-shot classifier for efficient, offline, clean sample selection.
textitCLIPCleaner offers a simple, single-step approach that achieves competitive or superior performance on benchmark datasets.
arXiv Detail & Related papers (2024-08-19T14:05:58Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for
Severe Label Noise [4.90148689564172]
Real-world datasets contain noisy label samples that have no semantic relevance to any class in the dataset.
Most state-of-the-art methods leverage ID labeled noisy samples as unlabeled data for semi-supervised learning.
We propose incorporating the information from all the training data by leveraging the benefits of self-supervised training.
arXiv Detail & Related papers (2023-08-13T23:33:33Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - From Noisy Prediction to True Label: Noisy Prediction Calibration via
Generative Model [22.722830935155223]
Noisy Prediction (NPC) is a new approach to learning with noisy labels.
NPC corrects the noisy prediction from the pre-trained classifier to the true label as a post-processing scheme.
Our method boosts the classification performances of all baseline models on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-02T07:15:45Z) - UNICON: Combating Label Noise Through Uniform Selection and Contrastive
Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise.
We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z) - LongReMix: Robust Learning with High Confidence Samples in a Noisy Label
Environment [33.376639002442914]
We propose the new 2-stage noisy-label training algorithm LongReMix.
We test LongReMix on the noisy-label benchmarks CIFAR-10, CIFAR-100, WebVision, Clothing1M, and Food101-N.
Our approach achieves state-of-the-art performance in most datasets.
arXiv Detail & Related papers (2021-03-06T18:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.