Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection
- URL: http://arxiv.org/abs/2402.11242v1
- Date: Sat, 17 Feb 2024 10:34:53 GMT
- Title: Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection
- Authors: Huafeng Liu, Mengmeng Sheng, Zeren Sun, Yazhou Yao, Xian-Sheng Hua,
and Heng-Tao Shen
- Abstract summary: Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
- Score: 82.43311784594384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning with noisy labels has gained increasing attention because the
inevitable imperfect labels in real-world scenarios can substantially hurt the
deep model performance. Recent studies tend to regard low-loss samples as clean
ones and discard high-loss ones to alleviate the negative impact of noisy
labels. However, real-world datasets contain not only noisy labels but also
class imbalance. The imbalance issue is prone to causing failure in the
loss-based sample selection since the under-learning of tail classes also leans
to produce high losses. To this end, we propose a simple yet effective method
to address noisy labels in imbalanced datasets. Specifically, we propose
Class-Balance-based sample Selection (CBS) to prevent the tail class samples
from being neglected during training. We propose Confidence-based Sample
Augmentation (CSA) for the chosen clean samples to enhance their reliability in
the training process. To exploit selected noisy samples, we resort to
prediction history to rectify labels of noisy samples. Moreover, we introduce
the Average Confidence Margin (ACM) metric to measure the quality of corrected
labels by leveraging the model's evolving training dynamics, thereby ensuring
that low-quality corrected noisy samples are appropriately masked out. Lastly,
consistency regularization is imposed on filtered label-corrected noisy samples
to boost model performance. Comprehensive experimental results on synthetic and
real-world datasets demonstrate the effectiveness and superiority of our
proposed method, especially in imbalanced scenarios. Comprehensive experimental
results on synthetic and real-world datasets demonstrate the effectiveness and
superiority of our proposed method, especially in imbalanced scenarios.
Related papers
- Foster Adaptivity and Balance in Learning with Noisy Labels [26.309508654960354]
We propose a novel approach named textbfSED to deal with label noise in a textbfSelf-adaptivtextbfE and class-balancetextbfD manner.
A mean-teacher model is then employed to correct labels of noisy samples.
We additionally propose a self-adaptive and class-balanced sample re-weighting mechanism to assign different weights to detected noisy samples.
arXiv Detail & Related papers (2024-07-03T03:10:24Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Improving group robustness under noisy labels using predictive
uncertainty [0.9449650062296823]
We use the predictive uncertainty of a model to improve the worst-group accuracy under noisy labels.
We propose a novel ENtropy based Debiasing (END) framework that prevents models from learning the spurious cues while being robust to the noisy labels.
arXiv Detail & Related papers (2022-12-14T04:40:50Z) - Learning with Noisy Labels over Imbalanced Subpopulations [13.477553187049462]
Learning with noisy labels (LNL) has attracted significant attention from the research community.
We propose a novel LNL method to simultaneously deal with noisy labels and imbalanced subpopulations.
We introduce a feature-based metric that takes the sample correlation into account for estimating samples' clean probabilities.
arXiv Detail & Related papers (2022-11-16T07:25:24Z) - Learning from Noisy Labels with Coarse-to-Fine Sample Credibility
Modeling [22.62790706276081]
Training deep neural network (DNN) with noisy labels is practically challenging.
Previous efforts tend to handle part or full data in a unified denoising flow.
We propose a coarse-to-fine robust learning method called CREMA to handle noisy data in a divide-and-conquer manner.
arXiv Detail & Related papers (2022-08-23T02:06:38Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets [23.4536532321199]
We propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
Inspired by our observations, we propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
arXiv Detail & Related papers (2022-07-12T11:35:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.