Deep Learning is Provably Robust to Symmetric Label Noise
- URL: http://arxiv.org/abs/2210.15083v1
- Date: Wed, 26 Oct 2022 23:41:17 GMT
- Title: Deep Learning is Provably Robust to Symmetric Label Noise
- Authors: Carey E. Priebe, Ningyuan Huang, Soledad Villar, Cong Mu, Li Chen
- Abstract summary: We show that certain deep neural networks (DNNs) can tolerate massive symmetric label noise up to the information-theoretic threshold.
We conjecture that for general label noise, mitigation strategies that make use of the noisy data will outperform those that ignore the noisy data.
- Score: 20.496591498468778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) are capable of perfectly fitting the training
data, including memorizing noisy data. It is commonly believed that
memorization hurts generalization. Therefore, many recent works propose
mitigation strategies to avoid noisy data or correct memorization. In this
work, we step back and ask the question: Can deep learning be robust against
massive label noise without any mitigation? We provide an affirmative answer
for the case of symmetric label noise: We find that certain DNNs, including
under-parameterized and over-parameterized models, can tolerate massive
symmetric label noise up to the information-theoretic threshold. By appealing
to classical statistical theory and universal consistency of DNNs, we prove
that for multiclass classification, $L_1$-consistent DNN classifiers trained
under symmetric label noise can achieve Bayes optimality asymptotically if the
label noise probability is less than $\frac{K-1}{K}$, where $K \ge 2$ is the
number of classes. Our results show that for symmetric label noise, no
mitigation is necessary for $L_1$-consistent estimators. We conjecture that for
general label noise, mitigation strategies that make use of the noisy data will
outperform those that ignore the noisy data.
Related papers
- Robust Classification with Noisy Labels Based on Posterior Maximization [4.550290285002704]
In this paper, we investigate the robustness to label noise of an $f$-divergence-based class of objective functions recently proposed for supervised classification.
We show that, in the presence of label noise, any of the $f$-PML objective functions can be corrected to obtain a neural network that is equal to the one learned with the clean dataset.
arXiv Detail & Related papers (2025-04-09T11:52:51Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Robust Meta-learning with Sampling Noise and Label Noise via
Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples.
When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise.
We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z) - UNICON: Combating Label Noise Through Uniform Selection and Contrastive
Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise.
We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z) - Scalable Penalized Regression for Noise Detection in Learning with Noisy
Labels [44.79124350922491]
We propose using a theoretically guaranteed noisy label detection framework to detect and remove noisy data for Learning with Noisy Labels (LNL)
Specifically, we design a penalized regression to model the linear relation between network features and one-hot labels.
To make the framework scalable to datasets that contain a large number of categories and training data, we propose a split algorithm to divide the whole training set into small pieces.
arXiv Detail & Related papers (2022-03-15T11:09:58Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - Error-Bounded Correction of Noisy Labels [17.510654621245656]
We show that the prediction of a noisy classifier can indeed be a good indicator of whether the label of a training data is clean.
Based on the theoretical result, we propose a novel algorithm that corrects the labels based on the noisy classifier prediction.
We incorporate our label correction algorithm into the training of deep neural networks and train models that achieve superior testing performance on multiple public datasets.
arXiv Detail & Related papers (2020-11-19T19:23:23Z) - Class2Simi: A Noise Reduction Perspective on Learning with Noisy Labels [98.13491369929798]
We propose a framework called Class2Simi, which transforms data points with noisy class labels to data pairs with noisy similarity labels.
Class2Simi is computationally efficient because not only this transformation is on-the-fly in mini-batches, but also it just changes loss on top of model prediction into a pairwise manner.
arXiv Detail & Related papers (2020-06-14T07:55:32Z) - NoiseRank: Unsupervised Label Noise Reduction with Dependence Models [11.08987870095179]
We propose NoiseRank, for unsupervised label noise reduction using Markov Random Fields (MRF)
We construct a dependence model to estimate the posterior probability of an instance being incorrectly labeled given the dataset, and rank instances based on their estimated probabilities.
NoiseRank improves state-of-the-art classification on Food101-N (20% noise) and is effective on high noise Clothing-1M (40% noise)
arXiv Detail & Related papers (2020-03-15T01:10:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.