Learning with Noisy Labels over Imbalanced Subpopulations
- URL: http://arxiv.org/abs/2211.08722v1
- Date: Wed, 16 Nov 2022 07:25:24 GMT
- Title: Learning with Noisy Labels over Imbalanced Subpopulations
- Authors: MingCai Chen, Yu Zhao, Bing He, Zongbo Han, Bingzhe Wu, Jianhua Yao
- Abstract summary: Learning with noisy labels (LNL) has attracted significant attention from the research community.
We propose a novel LNL method to simultaneously deal with noisy labels and imbalanced subpopulations.
We introduce a feature-based metric that takes the sample correlation into account for estimating samples' clean probabilities.
- Score: 13.477553187049462
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Learning with Noisy Labels (LNL) has attracted significant attention from the
research community. Many recent LNL methods rely on the assumption that clean
samples tend to have "small loss". However, this assumption always fails to
generalize to some real-world cases with imbalanced subpopulations, i.e.,
training subpopulations varying in sample size or recognition difficulty.
Therefore, recent LNL methods face the risk of misclassifying those
"informative" samples (e.g., hard samples or samples in the tail
subpopulations) into noisy samples, leading to poor generalization performance.
To address the above issue, we propose a novel LNL method to simultaneously
deal with noisy labels and imbalanced subpopulations. It first leverages sample
correlation to estimate samples' clean probabilities for label correction and
then utilizes corrected labels for Distributionally Robust Optimization (DRO)
to further improve the robustness. Specifically, in contrast to previous works
using classification loss as the selection criterion, we introduce a
feature-based metric that takes the sample correlation into account for
estimating samples' clean probabilities. Then, we refurbish the noisy labels
using the estimated clean probabilities and the pseudo-labels from the model's
predictions. With refurbished labels, we use DRO to train the model to be
robust to subpopulation imbalance. Extensive experiments on a wide range of
benchmarks demonstrate that our technique can consistently improve current
state-of-the-art robust learning paradigms against noisy labels, especially
when encountering imbalanced subpopulations.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Regroup Median Loss for Combating Label Noise [19.51996047333779]
Deep model training requires large-scale datasets of annotated data.
Due to the difficulty of annotating a large number of samples, label noise caused by incorrect annotations is inevitable.
We propose Regroup Median Loss (RML) to reduce the probability of selecting noisy samples and correct losses of noisy samples.
arXiv Detail & Related papers (2023-12-11T10:19:55Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - PASS: Peer-Agreement based Sample Selection for training with Noisy Labels [16.283722126438125]
The prevalence of noisy-label samples poses a significant challenge in deep learning, inducing overfitting effects.
Current methodologies often rely on the small-loss hypothesis or feature-based selection to separate noisy- and clean-label samples.
We propose a new noisy-label detection method, termed Peer-Agreement based Sample Selection (PASS), to address this problem.
arXiv Detail & Related papers (2023-03-20T00:35:33Z) - Improving group robustness under noisy labels using predictive
uncertainty [0.9449650062296823]
We use the predictive uncertainty of a model to improve the worst-group accuracy under noisy labels.
We propose a novel ENtropy based Debiasing (END) framework that prevents models from learning the spurious cues while being robust to the noisy labels.
arXiv Detail & Related papers (2022-12-14T04:40:50Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.