Learning from Long-Tailed Noisy Data with Sample Selection and Balanced
Loss
- URL: http://arxiv.org/abs/2211.10906v3
- Date: Sun, 28 May 2023 08:39:46 GMT
- Title: Learning from Long-Tailed Noisy Data with Sample Selection and Balanced
Loss
- Authors: Lefan Zhang, Zhang-Hao Tian, Wujun Zhou, Wei Wang
- Abstract summary: We propose a robust method for learning from long-tailed noisy data with sample selection and balanced loss.
Specifically, we separate the noisy training data into clean labeled set and unlabeled set with sample selection, and train the deep neural network in a semi-supervised manner.
- Score: 8.71234615872208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of deep learning depends on large-scale and well-curated training
data, while data in real-world applications are commonly long-tailed and noisy.
Many methods have been proposed to deal with long-tailed data or noisy data,
while a few methods are developed to tackle long-tailed noisy data. To solve
this, we propose a robust method for learning from long-tailed noisy data with
sample selection and balanced loss. Specifically, we separate the noisy
training data into clean labeled set and unlabeled set with sample selection,
and train the deep neural network in a semi-supervised manner with a balanced
loss based on model bias. Extensive experiments on benchmarks demonstrate that
our method outperforms existing state-of-the-art methods.
Related papers
- Learning from Noisy Labels for Long-tailed Data via Optimal Transport [2.8821062918162146]
We propose a novel approach to manage data characterized by both long-tailed distributions and noisy labels.
We employ optimal transport strategies to generate pseudo-labels for the noise set in a semi-supervised training manner.
arXiv Detail & Related papers (2024-08-07T14:15:18Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Learning to Abstain From Uninformative Data [20.132146513548843]
We study the problem of learning and acting under a general noisy generative process.
In this problem, the data distribution has a significant proportion of uninformative samples with high noise in the label.
We propose a novel approach to learning under these conditions via a loss inspired by the selective learning theory.
arXiv Detail & Related papers (2023-09-25T15:55:55Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition [70.00984078351927]
This paper focuses on reducing noise based on some inherent properties of multi-label classification and long-tailed learning under noisy cases.
We propose a Stitch-Up augmentation to synthesize a cleaner sample, which directly reduces multi-label noise.
A Heterogeneous Co-Learning framework is further designed to leverage the inconsistency between long-tailed and balanced distributions.
arXiv Detail & Related papers (2023-07-03T09:20:28Z) - MILD: Modeling the Instance Learning Dynamics for Learning with Noisy
Labels [19.650299232829546]
We propose an iterative selection approach based on the Weibull mixture model to identify clean data.
In particular, we measure the difficulty of memorization and memorize for each instance via the transition times between being misclassified and being memorized.
Our strategy outperforms existing noisy-label learning methods.
arXiv Detail & Related papers (2023-06-20T14:26:53Z) - Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data.
Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z) - Robust Long-Tailed Learning under Label Noise [50.00837134041317]
This work investigates the label noise problem under long-tailed label distribution.
We propose a robust framework,algo, that realizes noise detection for long-tailed learning.
Our framework can naturally leverage semi-supervised learning algorithms to further improve the generalisation.
arXiv Detail & Related papers (2021-08-26T03:45:00Z) - Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes.
We propose a new recognition setting, namely semi-supervised long-tailed recognition.
We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.