Self-Filtering: A Noise-Aware Sample Selection for Label Noise with
Confidence Penalization
- URL: http://arxiv.org/abs/2208.11351v1
- Date: Wed, 24 Aug 2022 08:02:36 GMT
- Title: Self-Filtering: A Noise-Aware Sample Selection for Label Noise with
Confidence Penalization
- Authors: Qi Wei, Haoliang Sun, Xiankai Lu, Yilong Yin
- Abstract summary: We propose a novel selection strategy, textbfSelf-textbfFiltextbftering (SFT), that utilizes the fluctuation of noisy examples in historical predictions to filter them.
Specifically, we introduce a memory bank module that stores the historical predictions of each example and dynamically updates to support the selection for the subsequent learning iteration.
By increasing the weight of the misclassified categories with this term, the loss function is robust to label noise in mild conditions.
- Score: 39.90342091782778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sample selection is an effective strategy to mitigate the effect of label
noise in robust learning. Typical strategies commonly apply the small-loss
criterion to identify clean samples. However, those samples lying around the
decision boundary with large losses usually entangle with noisy examples, which
would be discarded with this criterion, leading to the heavy degeneration of
the generalization performance. In this paper, we propose a novel selection
strategy, \textbf{S}elf-\textbf{F}il\textbf{t}ering (SFT), that utilizes the
fluctuation of noisy examples in historical predictions to filter them, which
can avoid the selection bias of the small-loss criterion for the boundary
examples. Specifically, we introduce a memory bank module that stores the
historical predictions of each example and dynamically updates to support the
selection for the subsequent learning iteration. Besides, to reduce the
accumulated error of the sample selection bias of SFT, we devise a
regularization term to penalize the confident output distribution. By
increasing the weight of the misclassified categories with this term, the loss
function is robust to label noise in mild conditions. We conduct extensive
experiments on three benchmarks with variant noise types and achieve the new
state-of-the-art. Ablation studies and further analysis verify the virtue of
SFT for sample selection in robust learning.
Related papers
- ANNE: Adaptive Nearest Neighbors and Eigenvector-based Sample Selection for Robust Learning with Noisy Labels [7.897299759691143]
This paper introduces the Adaptive Nearest Neighbors and Eigenvector-based (ANNE) sample selection methodology.
ANNE integrates loss-based sampling with the feature-based sampling methods FINE and Adaptive KNN to optimize performance across a wide range of noise rate scenarios.
arXiv Detail & Related papers (2024-11-03T15:51:38Z) - Foster Adaptivity and Balance in Learning with Noisy Labels [26.309508654960354]
We propose a novel approach named textbfSED to deal with label noise in a textbfSelf-adaptivtextbfE and class-balancetextbfD manner.
A mean-teacher model is then employed to correct labels of noisy samples.
We additionally propose a self-adaptive and class-balanced sample re-weighting mechanism to assign different weights to detected noisy samples.
arXiv Detail & Related papers (2024-07-03T03:10:24Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Regroup Median Loss for Combating Label Noise [19.51996047333779]
Deep model training requires large-scale datasets of annotated data.
Due to the difficulty of annotating a large number of samples, label noise caused by incorrect annotations is inevitable.
We propose Regroup Median Loss (RML) to reduce the probability of selecting noisy samples and correct losses of noisy samples.
arXiv Detail & Related papers (2023-12-11T10:19:55Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Doubly Stochastic Models: Learning with Unbiased Label Noises and
Inference Stability [85.1044381834036]
We investigate the implicit regularization effects of label noises under mini-batch sampling settings of gradient descent.
We find such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters.
Our work doesn't assume SGD as an Ornstein-Uhlenbeck like process and achieve a more general result with convergence of approximation proved.
arXiv Detail & Related papers (2023-04-01T14:09:07Z) - PASS: Peer-Agreement based Sample Selection for training with Noisy Labels [16.283722126438125]
The prevalence of noisy-label samples poses a significant challenge in deep learning, inducing overfitting effects.
Current methodologies often rely on the small-loss hypothesis or feature-based selection to separate noisy- and clean-label samples.
We propose a new noisy-label detection method, termed Peer-Agreement based Sample Selection (PASS), to address this problem.
arXiv Detail & Related papers (2023-03-20T00:35:33Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.