Learning From Long-Tailed Data With Noisy Labels
- URL: http://arxiv.org/abs/2108.11096v1
- Date: Wed, 25 Aug 2021 07:45:40 GMT
- Title: Learning From Long-Tailed Data With Noisy Labels
- Authors: Shyamgopal Karthik and J\'erome Revaud and Chidlovskii Boris
- Abstract summary: Class imbalance and noisy labels are the norm in many large-scale classification datasets.
We present a simple two-stage approach based on recent advances in self-supervised learning.
We find that self-supervised learning approaches are effectively able to cope with severe class imbalance.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Class imbalance and noisy labels are the norm rather than the exception in
many large-scale classification datasets. Nevertheless, most works in machine
learning typically assume balanced and clean data. There have been some recent
attempts to tackle, on one side, the problem of learning from noisy labels and,
on the other side, learning from long-tailed data. Each group of methods make
simplifying assumptions about the other. Due to this separation, the proposed
solutions often underperform when both assumptions are violated. In this work,
we present a simple two-stage approach based on recent advances in
self-supervised learning to treat both challenges simultaneously. It consists
of, first, task-agnostic self-supervised pre-training, followed by
task-specific fine-tuning using an appropriate loss. Most significantly, we
find that self-supervised learning approaches are effectively able to cope with
severe class imbalance. In addition, the resulting learned representations are
also remarkably robust to label noise, when fine-tuned with an imbalance- and
noise-resistant loss function. We validate our claims with experiments on
CIFAR-10 and CIFAR-100 augmented with synthetic imbalance and noise, as well as
the large-scale inherently noisy Clothing-1M dataset.
Related papers
- Conformal-in-the-Loop for Learning with Imbalanced Noisy Data [5.69777817429044]
Class imbalance and label noise are pervasive in large-scale datasets.
Much of machine learning research assumes well-labeled, balanced data, which rarely reflects real world conditions.
We propose Conformal-in-the-Loop (CitL), a novel training framework that addresses both challenges with a conformal prediction-based approach.
arXiv Detail & Related papers (2024-11-04T17:09:58Z) - Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures [15.358504449550013]
We design algorithms to learn from noisy labels for two broad classes of non-decomposable performance measures.
In both cases, we develop noise-corrected versions of the algorithms under the widely studied class-conditional noise models.
Our experiments demonstrate the effectiveness of our algorithms in handling label noise.
arXiv Detail & Related papers (2024-02-01T23:03:53Z) - DIRECT: Deep Active Learning under Imbalance and Label Noise [15.571923343398657]
We conduct the first study of active learning under both class imbalance and label noise.
We propose a novel algorithm that robustly identifies the class separation threshold and annotates the most uncertain examples.
Our results demonstrate that DIRECT can save more than 60% of the annotation budget compared to state-of-art active learning algorithms.
arXiv Detail & Related papers (2023-12-14T18:18:34Z) - Identifying Hard Noise in Long-Tailed Sample Distribution [76.16113794808001]
We introduce Noisy Long-Tailed Classification (NLT)
Most de-noising methods fail to identify the hard noises.
We design an iterative noisy learning framework called Hard-to-Easy (H2E)
arXiv Detail & Related papers (2022-07-27T09:03:03Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - Robust Long-Tailed Learning under Label Noise [50.00837134041317]
This work investigates the label noise problem under long-tailed label distribution.
We propose a robust framework,algo, that realizes noise detection for long-tailed learning.
Our framework can naturally leverage semi-supervised learning algorithms to further improve the generalisation.
arXiv Detail & Related papers (2021-08-26T03:45:00Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - EvidentialMix: Learning with Combined Open-set and Closed-set Noisy
Labels [30.268962418683955]
We study a new variant of the noisy label problem that combines the open-set and closed-set noisy labels.
Our results show that our method produces superior classification results and better feature representations than previous state-of-the-art methods.
arXiv Detail & Related papers (2020-11-11T11:15:32Z) - Learning from Noisy Similar and Dissimilar Data [84.76686918337134]
We show how to learn a classifier from noisy S and D labeled data.
We also show important connections between learning from such pairwise supervision data and learning from ordinary class-labeled data.
arXiv Detail & Related papers (2020-02-03T19:59:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.