Label-Noise Learning with Intrinsically Long-Tailed Data
- URL: http://arxiv.org/abs/2208.09833v3
- Date: Mon, 14 Aug 2023 05:29:40 GMT
- Title: Label-Noise Learning with Intrinsically Long-Tailed Data
- Authors: Yang Lu, Yiliang Zhang, Bo Han, Yiu-ming Cheung, Hanzi Wang
- Abstract summary: We propose a learning framework for label-noise learning with intrinsically long-tailed data.
Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
- Score: 65.41318436799993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Label noise is one of the key factors that lead to the poor generalization of
deep learning models. Existing label-noise learning methods usually assume that
the ground-truth classes of the training data are balanced. However, the
real-world data is often imbalanced, leading to the inconsistency between
observed and intrinsic class distribution with label noises. In this case, it
is hard to distinguish clean samples from noisy samples on the intrinsic tail
classes with the unknown intrinsic class distribution. In this paper, we
propose a learning framework for label-noise learning with intrinsically
long-tailed data. Specifically, we propose two-stage bi-dimensional sample
selection (TABASCO) to better separate clean samples from noisy samples,
especially for the tail classes. TABASCO consists of two new separation metrics
that complement each other to compensate for the limitation of using a single
metric in sample separation. Extensive experiments on benchmarks demonstrate
the effectiveness of our method. Our code is available at
https://github.com/Wakings/TABASCO.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for
Severe Label Noise [4.90148689564172]
Real-world datasets contain noisy label samples that have no semantic relevance to any class in the dataset.
Most state-of-the-art methods leverage ID labeled noisy samples as unlabeled data for semi-supervised learning.
We propose incorporating the information from all the training data by leveraging the benefits of self-supervised training.
arXiv Detail & Related papers (2023-08-13T23:33:33Z) - Differences Between Hard and Noisy-labeled Samples: An Empirical Study [7.132368785057315]
noisy or incorrectly labeled samples from a labeled dataset with hard/difficult samples is an important yet under-explored topic.
We introduce a simple yet effective metric that filters out noisy-labeled samples while keeping the hard samples.
Our proposed data partitioning method significantly outperforms other methods when employed within a semi-supervised learning framework.
arXiv Detail & Related papers (2023-07-20T09:24:23Z) - Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition [70.00984078351927]
This paper focuses on reducing noise based on some inherent properties of multi-label classification and long-tailed learning under noisy cases.
We propose a Stitch-Up augmentation to synthesize a cleaner sample, which directly reduces multi-label noise.
A Heterogeneous Co-Learning framework is further designed to leverage the inconsistency between long-tailed and balanced distributions.
arXiv Detail & Related papers (2023-07-03T09:20:28Z) - Centrality and Consistency: Two-Stage Clean Samples Identification for
Learning with Instance-Dependent Noisy Labels [87.48541631675889]
We propose a two-stage clean samples identification method.
First, we employ a class-level feature clustering procedure for the early identification of clean samples.
Second, for the remaining clean samples that are close to the ground truth class boundary, we propose a novel consistency-based classification method.
arXiv Detail & Related papers (2022-07-29T04:54:57Z) - Robust Long-Tailed Learning under Label Noise [50.00837134041317]
This work investigates the label noise problem under long-tailed label distribution.
We propose a robust framework,algo, that realizes noise detection for long-tailed learning.
Our framework can naturally leverage semi-supervised learning algorithms to further improve the generalisation.
arXiv Detail & Related papers (2021-08-26T03:45:00Z) - DivideMix: Learning with Noisy Labels as Semi-supervised Learning [111.03364864022261]
We propose DivideMix, a framework for learning with noisy labels.
Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods.
arXiv Detail & Related papers (2020-02-18T06:20:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.