Tackling Instance-Dependent Label Noise with Dynamic Distribution
Calibration
- URL: http://arxiv.org/abs/2210.05126v1
- Date: Tue, 11 Oct 2022 03:50:52 GMT
- Title: Tackling Instance-Dependent Label Noise with Dynamic Distribution
Calibration
- Authors: Manyi Zhang, Yuxin Ren, Zihao Wang, Chun Yuan
- Abstract summary: Instance-dependent label noise is realistic but rather challenging, where the label-corruption process depends on instances directly.
It causes a severe distribution shift between the distributions of training and test data, which impairs the generalization of trained models.
In this paper, to address the distribution shift in learning with instance-dependent label noise, a dynamic distribution-calibration strategy is adopted.
- Score: 18.59803726676361
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instance-dependent label noise is realistic but rather challenging, where the
label-corruption process depends on instances directly. It causes a severe
distribution shift between the distributions of training and test data, which
impairs the generalization of trained models. Prior works put great effort into
tackling the issue. Unfortunately, these works always highly rely on strong
assumptions or remain heuristic without theoretical guarantees. In this paper,
to address the distribution shift in learning with instance-dependent label
noise, a dynamic distribution-calibration strategy is adopted. Specifically, we
hypothesize that, before training data are corrupted by label noise, each class
conforms to a multivariate Gaussian distribution at the feature level. Label
noise produces outliers to shift the Gaussian distribution. During training, to
calibrate the shifted distribution, we propose two methods based on the mean
and covariance of multivariate Gaussian distribution respectively. The
mean-based method works in a recursive dimension-reduction manner for robust
mean estimation, which is theoretically guaranteed to train a high-quality
model against label noise. The covariance-based method works in a distribution
disturbance manner, which is experimentally verified to improve the model
robustness. We demonstrate the utility and effectiveness of our methods on
datasets with synthetic label noise and real-world unknown noise.
Related papers
- Inaccurate Label Distribution Learning with Dependency Noise [52.08553913094809]
We introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning.
We show that DN-ILDL effectively addresses the ILDL problem and outperforms existing LDL methods.
arXiv Detail & Related papers (2024-05-26T07:58:07Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - When Noisy Labels Meet Long Tail Dilemmas: A Representation Calibration
Method [40.25499257944916]
Real-world datasets are both noisily labeled and class-imbalanced.
We propose a representation calibration method RCAL.
We derive theoretical results to discuss the effectiveness of our representation calibration.
arXiv Detail & Related papers (2022-11-20T11:36:48Z) - Combating Noisy Labels in Long-Tailed Image Classification [33.40963778043824]
This paper makes an early effort to tackle the image classification task with both long-tailed distribution and label noise.
Existing noise-robust learning methods cannot work in this scenario as it is challenging to differentiate noisy samples from clean samples of tail classes.
We propose a new learning paradigm based on matching between inferences on weak and strong data augmentations to screen out noisy samples.
arXiv Detail & Related papers (2022-09-01T07:31:03Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution.
We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z) - Ensemble Learning with Manifold-Based Data Splitting for Noisy Label
Correction [20.401661156102897]
noisy labels in training data can significantly degrade a model's generalization performance.
We propose an ensemble learning method to correct noisy labels by exploiting the local structures of feature manifold.
Our experiments on real-world noisy label datasets demonstrate the superiority of the proposed method over existing state-of-the-arts.
arXiv Detail & Related papers (2021-03-13T07:24:58Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.