Sample Prior Guided Robust Model Learning to Suppress Noisy Labels
- URL: http://arxiv.org/abs/2112.01197v2
- Date: Sun, 5 Dec 2021 09:20:51 GMT
- Title: Sample Prior Guided Robust Model Learning to Suppress Noisy Labels
- Authors: Wenkai Chen, Chuang Zhu, Yi Chen
- Abstract summary: We propose PGDF, a novel framework to learn a deep model to suppress noise by generating the samples' prior knowledge.
Our framework can save more informative hard clean samples into the cleanly labeled set.
We evaluate our method using synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the real-world datasets WebVision and Clothing1M.
- Score: 8.119439844514973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imperfect labels are ubiquitous in real-world datasets and seriously harm the
model performance. Several recent effective methods for handling noisy labels
have two key steps: 1) dividing samples into cleanly labeled and wrongly
labeled sets by training loss, 2) using semi-supervised methods to generate
pseudo-labels for samples in the wrongly labeled set. However, current methods
always hurt the informative hard samples due to the similar loss distribution
between the hard samples and the noisy ones. In this paper, we proposed PGDF
(Prior Guided Denoising Framework), a novel framework to learn a deep model to
suppress noise by generating the samples' prior knowledge, which is integrated
into both dividing samples step and semi-supervised step. Our framework can
save more informative hard clean samples into the cleanly labeled set. Besides,
our framework also promotes the quality of pseudo-labels during the
semi-supervised step by suppressing the noise in the current pseudo-labels
generating scheme. To further enhance the hard samples, we reweight the samples
in the cleanly labeled set during training. We evaluated our method using
synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the
real-world datasets WebVision and Clothing1M. The results demonstrate
substantial improvements over state-of-the-art methods.
Related papers
- Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels [13.314778587751588]
Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching.
It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training.
We propose a framework that trains the model with new synthetic samples to mitigate the impact of noisy labels.
arXiv Detail & Related papers (2024-06-22T04:49:39Z) - Robust Noisy Label Learning via Two-Stream Sample Distillation [48.73316242851264]
Noisy label learning aims to learn robust networks under the supervision of noisy labels.
We design a simple yet effective sample selection framework, termed Two-Stream Sample Distillation (TSSD)
This framework can extract more high-quality samples with clean labels to improve the robustness of network training.
arXiv Detail & Related papers (2024-04-16T12:18:08Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Learning from Noisy Labels with Coarse-to-Fine Sample Credibility
Modeling [22.62790706276081]
Training deep neural network (DNN) with noisy labels is practically challenging.
Previous efforts tend to handle part or full data in a unified denoising flow.
We propose a coarse-to-fine robust learning method called CREMA to handle noisy data in a divide-and-conquer manner.
arXiv Detail & Related papers (2022-08-23T02:06:38Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - An Ensemble Noise-Robust K-fold Cross-Validation Selection Method for
Noisy Labels [0.9699640804685629]
Large-scale datasets tend to contain mislabeled samples that can be memorized by deep neural networks (DNNs)
We present Ensemble Noise-robust K-fold Cross-Validation Selection (E-NKCVS) to effectively select clean samples from noisy data.
We evaluate our approach on various image and text classification tasks where the labels have been manually corrupted with different noise ratios.
arXiv Detail & Related papers (2021-07-06T02:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.