Identifying Hard Noise in Long-Tailed Sample Distribution
- URL: http://arxiv.org/abs/2207.13378v2
- Date: Fri, 31 Mar 2023 07:03:13 GMT
- Title: Identifying Hard Noise in Long-Tailed Sample Distribution
- Authors: Xuanyu Yi, Kaihua Tang, Xian-Sheng Hua, Joo-Hwee Lim, Hanwang Zhang
- Abstract summary: We introduce Noisy Long-Tailed Classification (NLT)
Most de-noising methods fail to identify the hard noises.
We design an iterative noisy learning framework called Hard-to-Easy (H2E)
- Score: 76.16113794808001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional de-noising methods rely on the assumption that all samples are
independent and identically distributed, so the resultant classifier, though
disturbed by noise, can still easily identify the noises as the outliers of
training distribution. However, the assumption is unrealistic in large-scale
data that is inevitably long-tailed. Such imbalanced training data makes a
classifier less discriminative for the tail classes, whose previously "easy"
noises are now turned into "hard" ones -- they are almost as outliers as the
clean tail samples. We introduce this new challenge as Noisy Long-Tailed
Classification (NLT). Not surprisingly, we find that most de-noising methods
fail to identify the hard noises, resulting in significant performance drop on
the three proposed NLT benchmarks: ImageNet-NLT, Animal10-NLT, and Food101-NLT.
To this end, we design an iterative noisy learning framework called
Hard-to-Easy (H2E). Our bootstrapping philosophy is to first learn a classifier
as noise identifier invariant to the class and context distributional changes,
reducing "hard" noises to "easy" ones, whose removal further improves the
invariance. Experimental results show that our H2E outperforms state-of-the-art
de-noising methods and their ablations on long-tailed settings while
maintaining a stable performance on the conventional balanced settings.
Datasets and codes are available at https://github.com/yxymessi/H2E-Framework
Related papers
- Instance-dependent Noisy-label Learning with Graphical Model Based Noise-rate Estimation [16.283722126438125]
Label Noise Learning (LNL) incorporates a sample selection stage to differentiate clean and noisy-label samples.
Such curriculum is sub-optimal since it does not consider the actual label noise rate in the training set.
This paper addresses this issue with a new noise-rate estimation method that is easily integrated with most state-of-the-art (SOTA) LNL methods.
arXiv Detail & Related papers (2023-05-31T01:46:14Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data.
Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z) - Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets [23.4536532321199]
We propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
Inspired by our observations, we propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
arXiv Detail & Related papers (2022-07-12T11:35:55Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Learning From Long-Tailed Data With Noisy Labels [0.0]
Class imbalance and noisy labels are the norm in many large-scale classification datasets.
We present a simple two-stage approach based on recent advances in self-supervised learning.
We find that self-supervised learning approaches are effectively able to cope with severe class imbalance.
arXiv Detail & Related papers (2021-08-25T07:45:40Z) - Denoising Distantly Supervised Named Entity Recognition via a
Hypergeometric Probabilistic Model [26.76830553508229]
Hypergeometric Learning (HGL) is a denoising algorithm for distantly supervised named entity recognition.
HGL takes both noise distribution and instance-level confidence into consideration.
Experiments show that HGL can effectively denoise the weakly-labeled data retrieved from distant supervision.
arXiv Detail & Related papers (2021-06-17T04:01:25Z) - Training Classifiers that are Universally Robust to All Label Noise
Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise.
We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning.
Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z) - Confidence Scores Make Instance-dependent Label-noise Learning Possible [129.84497190791103]
In learning with noisy labels, for every instance, its label can randomly walk to other classes following a transition distribution which is named a noise model.
We introduce confidence-scored instance-dependent noise (CSIDN), where each instance-label pair is equipped with a confidence score.
We find with the help of confidence scores, the transition distribution of each instance can be approximately estimated.
arXiv Detail & Related papers (2020-01-11T16:15:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.