Identifying Hard Noise in Long-Tailed Sample Distribution
- URL: http://arxiv.org/abs/2207.13378v3
- Date: Wed, 15 Oct 2025 05:47:03 GMT
- Title: Identifying Hard Noise in Long-Tailed Sample Distribution
- Authors: Xuanyu Yi, Kaihua Tang, Xian-Sheng Hua, Joo-Hwee Lim, Hanwang Zhang,
- Abstract summary: We introduce Noisy Long-Tailed Classification (NLT)<n>Most de-noising methods fail to identify the hard noises.<n>We design an iterative noisy learning framework called Hard-to-Easy (H2E)
- Score: 71.8462682319137
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional de-noising methods rely on the assumption that all samples are independent and identically distributed, so the resultant classifier, though disturbed by noise, can still easily identify the noises as the outliers of training distribution. However, the assumption is unrealistic in large-scale data that is inevitably long-tailed. Such imbalanced training data makes a classifier less discriminative for the tail classes, whose previously "easy" noises are now turned into "hard" ones -- they are almost as outliers as the clean tail samples. We introduce this new challenge as Noisy Long-Tailed Classification (NLT). Not surprisingly, we find that most de-noising methods fail to identify the hard noises, resulting in significant performance drop on the three proposed NLT benchmarks: ImageNet-NLT, Animal10-NLT, and Food101-NLT. To this end, we design an iterative noisy learning framework called Hard-to-Easy (H2E). Our bootstrapping philosophy is to first learn a classifier as noise identifier invariant to the class and context distributional changes, reducing "hard" noises to "easy" ones, whose removal further improves the invariance. Experimental results show that our H2E outperforms state-of-the-art de-noising methods and their ablations on long-tailed settings while maintaining a stable performance on the conventional balanced settings. Datasets and codes are available at https://github.com/yxymessi/H2E-Framework
Related papers
- Pre-train to Gain: Robust Learning Without Clean Labels [1.1582652820340928]
Training deep networks with noisy labels leads to poor generalization and degraded accuracy.<n>By pre-training a feature extractor backbone without labels, we can train a more noise robust model without requiring a subset with clean labels.<n>Our approach achieves comparable results to ImageNet pre-trained models at low noise levels, while substantially outperforming them under high noise conditions.
arXiv Detail & Related papers (2025-11-25T20:48:07Z) - Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning [59.635264288605946]
Class Incremental Learning (CIL) aims to continuously learn new categories while retaining the knowledge of old ones.<n>Existing approaches that apply lightweight fine-tuning to backbones still induce drift.<n>We propose Mixture of Noise (Min) to mitigate the degradation of backbone generalization from adapting new tasks.
arXiv Detail & Related papers (2025-09-20T16:07:20Z) - Classifying Long-tailed and Label-noise Data via Disentangling and Unlearning [58.052712054684946]
In real-world datasets, the challenges of long-tailed distributions and noisy labels often coexist.
We propose a novel method called Disentangling and Unlearning for Long-tailed and Label-noisy data.
arXiv Detail & Related papers (2025-03-14T13:58:27Z) - Instance-dependent Noisy-label Learning with Graphical Model Based Noise-rate Estimation [16.283722126438125]
Label Noise Learning (LNL) incorporates a sample selection stage to differentiate clean and noisy-label samples.
Such curriculum is sub-optimal since it does not consider the actual label noise rate in the training set.
This paper addresses this issue with a new noise-rate estimation method that is easily integrated with most state-of-the-art (SOTA) LNL methods.
arXiv Detail & Related papers (2023-05-31T01:46:14Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Class Prototype-based Cleaner for Label Noise Learning [73.007001454085]
Semi-supervised learning methods are current SOTA solutions to the noisy-label learning problem.
We propose a simple yet effective solution, named textbfClass textbfPrototype-based label noise textbfCleaner.
arXiv Detail & Related papers (2022-12-21T04:56:41Z) - Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data.
Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z) - Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets [23.4536532321199]
We propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
Inspired by our observations, we propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
arXiv Detail & Related papers (2022-07-12T11:35:55Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Learning From Long-Tailed Data With Noisy Labels [0.0]
Class imbalance and noisy labels are the norm in many large-scale classification datasets.
We present a simple two-stage approach based on recent advances in self-supervised learning.
We find that self-supervised learning approaches are effectively able to cope with severe class imbalance.
arXiv Detail & Related papers (2021-08-25T07:45:40Z) - Denoising Distantly Supervised Named Entity Recognition via a
Hypergeometric Probabilistic Model [26.76830553508229]
Hypergeometric Learning (HGL) is a denoising algorithm for distantly supervised named entity recognition.
HGL takes both noise distribution and instance-level confidence into consideration.
Experiments show that HGL can effectively denoise the weakly-labeled data retrieved from distant supervision.
arXiv Detail & Related papers (2021-06-17T04:01:25Z) - Training Classifiers that are Universally Robust to All Label Noise
Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise.
We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning.
Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z) - Confidence Scores Make Instance-dependent Label-noise Learning Possible [129.84497190791103]
In learning with noisy labels, for every instance, its label can randomly walk to other classes following a transition distribution which is named a noise model.
We introduce confidence-scored instance-dependent noise (CSIDN), where each instance-label pair is equipped with a confidence score.
We find with the help of confidence scores, the transition distribution of each instance can be approximately estimated.
arXiv Detail & Related papers (2020-01-11T16:15:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.