Towards the Identifiability in Noisy Label Learning: A Multinomial
Mixture Approach
- URL: http://arxiv.org/abs/2301.01405v2
- Date: Sun, 16 Apr 2023 07:48:11 GMT
- Title: Towards the Identifiability in Noisy Label Learning: A Multinomial
Mixture Approach
- Authors: Cuong Nguyen, Thanh-Toan Do, Gustavo Carneiro
- Abstract summary: Learning from noisy labels (LNL) plays a crucial role in deep learning.
The most promising LNL methods rely on identifying clean-label samples from a dataset with noisy annotations.
We propose a method that automatically generates additional noisy labels by estimating the noisy label distribution based on nearest neighbours.
- Score: 37.32107678838193
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning from noisy labels (LNL) plays a crucial role in deep learning. The
most promising LNL methods rely on identifying clean-label samples from a
dataset with noisy annotations. Such an identification is challenging because
the conventional LNL problem, which assumes a single noisy label per instance,
is non-identifiable, i.e., clean labels cannot be estimated theoretically
without additional heuristics. In this paper, we aim to formally investigate
this identifiability issue using multinomial mixture models to determine the
constraints that make the problem identifiable. Specifically, we discover that
the LNL problem becomes identifiable if there are at least $2C - 1$ noisy
labels per instance, where $C$ is the number of classes. To meet this
requirement without relying on additional $2C - 2$ manual annotations per
instance, we propose a method that automatically generates additional noisy
labels by estimating the noisy label distribution based on nearest neighbours.
These additional noisy labels enable us to apply the Expectation-Maximisation
algorithm to estimate the posterior probabilities of clean labels, which are
then used to train the model of interest. We empirically demonstrate that our
proposed method is capable of estimating clean labels without any heuristics in
several label noise benchmarks, including synthetic, web-controlled, and
real-world label noises. Furthermore, our method performs competitively with
many state-of-the-art methods.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Multi-Label Noise Transition Matrix Estimation with Label Correlations:
Theory and Algorithm [73.94839250910977]
Noisy multi-label learning has garnered increasing attention due to the challenges posed by collecting large-scale accurate labels.
The introduction of transition matrices can help model multi-label noise and enable the development of statistically consistent algorithms.
We propose a novel estimator that leverages label correlations without the need for anchor points or precise fitting of noisy class posteriors.
arXiv Detail & Related papers (2023-09-22T08:35:38Z) - Partial Label Supervision for Agnostic Generative Noisy Label Learning [18.29334728940232]
Noisy label learning has been tackled with both discriminative and generative approaches.
We propose a novel framework for generative noisy label learning that addresses these challenges.
arXiv Detail & Related papers (2023-08-02T14:48:25Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - BadLabel: A Robust Perspective on Evaluating and Enhancing Label-noise
Learning [113.8799653759137]
We introduce a novel label noise type called BadLabel, which can significantly degrade the performance of existing LNL algorithms by a large margin.
BadLabel is crafted based on the label-flipping attack against standard classification.
We propose a robust LNL method that perturbs the labels in an adversarial manner at each epoch to make the loss values of clean and noisy labels again distinguishable.
arXiv Detail & Related papers (2023-05-28T06:26:23Z) - Instance-dependent Label Distribution Estimation for Learning with Label Noise [20.479674500893303]
Noise transition matrix (NTM) estimation is a promising approach for learning with label noise.
We propose an Instance-dependent Label Distribution Estimation (ILDE) method to learn from noisy labels for image classification.
Our results indicate that the proposed ILDE method outperforms all competing methods, no matter whether the noise is synthetic or real noise.
arXiv Detail & Related papers (2022-12-16T10:13:25Z) - An Ensemble Noise-Robust K-fold Cross-Validation Selection Method for
Noisy Labels [0.9699640804685629]
Large-scale datasets tend to contain mislabeled samples that can be memorized by deep neural networks (DNNs)
We present Ensemble Noise-robust K-fold Cross-Validation Selection (E-NKCVS) to effectively select clean samples from noisy data.
We evaluate our approach on various image and text classification tasks where the labels have been manually corrupted with different noise ratios.
arXiv Detail & Related papers (2021-07-06T02:14:52Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.