Robustness and reliability when training with noisy labels
- URL: http://arxiv.org/abs/2110.03321v1
- Date: Thu, 7 Oct 2021 10:30:20 GMT
- Title: Robustness and reliability when training with noisy labels
- Authors: Amanda Olmin and Fredrik Lindsten
- Abstract summary: Labelling of data for supervised learning can be costly and time-consuming.
Deep neural networks have proved capable of fitting random labels, regularisation and the use of robust loss functions.
- Score: 12.688634089849023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Labelling of data for supervised learning can be costly and time-consuming
and the risk of incorporating label noise in large data sets is imminent. If
training a flexible discriminative model using a strictly proper loss, such
noise will inevitably shift the solution towards the conditional distribution
over noisy labels. Nevertheless, while deep neural networks have proved capable
of fitting random labels, regularisation and the use of robust loss functions
empirically mitigate the effects of label noise. However, such observations
concern robustness in accuracy, which is insufficient if reliable uncertainty
quantification is critical. We demonstrate this by analysing the properties of
the conditional distribution over noisy labels for an input-dependent noise
model. In addition, we evaluate the set of robust loss functions characterised
by an overlap in asymptotic risk minimisers under the clean and noisy data
distributions. We find that strictly proper and robust loss functions both
offer asymptotic robustness in accuracy, but neither guarantee that the
resulting model is calibrated. Moreover, overfitting is an issue in practice.
With these results, we aim to explain inherent robustness of algorithms to
label noise and to give guidance in the development of new noise-robust
algorithms.
Related papers
- SoftPatch: Unsupervised Anomaly Detection with Noisy Data [67.38948127630644]
This paper considers label-level noise in image sensory anomaly detection for the first time.
We propose a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level.
Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset.
arXiv Detail & Related papers (2024-03-21T08:49:34Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Label Noise: Correcting the Forward-Correction [0.0]
Training neural network classifiers on datasets with label noise poses a risk of overfitting them to the noisy labels.
We propose an approach to tackling overfitting caused by label noise.
Motivated by this observation, we propose imposing a lower bound on the training loss to mitigate overfitting.
arXiv Detail & Related papers (2023-07-24T19:41:19Z) - Uncertainty-aware Self-training for Low-resource Neural Sequence
Labeling [29.744621356187764]
This paper presents SeqUST, a novel uncertain-aware self-training framework for Neural sequence labeling (NSL)
We incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation at the token level and then select reliable language tokens from unlabeled data.
A well-designed masked sequence labeling task with a noise-robust loss supports robust training, which aims to suppress the problem of noisy pseudo labels.
arXiv Detail & Related papers (2023-02-17T02:40:04Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets [23.4536532321199]
We propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
Inspired by our observations, we propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
arXiv Detail & Related papers (2022-07-12T11:35:55Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Open-set Label Noise Can Improve Robustness Against Inherent Label Noise [27.885927200376386]
We show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels.
We propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training.
arXiv Detail & Related papers (2021-06-21T07:15:50Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.