An Investigation of how Label Smoothing Affects Generalization
- URL: http://arxiv.org/abs/2010.12648v1
- Date: Fri, 23 Oct 2020 20:26:25 GMT
- Title: An Investigation of how Label Smoothing Affects Generalization
- Authors: Blair Chen, Liu Ziyin, Zihao Wang, Paul Pu Liang
- Abstract summary: We show how label smoothing provides in controlling the generalization loss.
Our theory also predicts the existence of an optimal label smoothing point.
Our findings will help both theoreticians and practitioners understand label smoothing.
- Score: 22.663974656813824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has been hypothesized that label smoothing can reduce overfitting and
improve generalization, and current empirical evidence seems to corroborate
these effects. However, there is a lack of mathematical understanding of when
and why such empirical improvements occur. In this paper, as a step towards
understanding why label smoothing is effective, we propose a theoretical
framework to show how label smoothing provides in controlling the
generalization loss. In particular, we show that this benefit can be precisely
formulated and identified in the label noise setting, where the training is
partially mislabeled. Our theory also predicts the existence of an optimal
label smoothing point, a single value for the label smoothing hyperparameter
that minimizes generalization loss. Extensive experiments are done to confirm
the predictions of our theory. We believe that our findings will help both
theoreticians and practitioners understand label smoothing, and better apply
them to real-world datasets.
Related papers
- You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling [60.27812493442062]
We show the importance of investigating labeled data quality to improve any pseudo-labeling method.
Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling.
We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world datasets.
arXiv Detail & Related papers (2024-06-19T17:58:40Z) - Learning From Biased Soft Labels [48.84637168570285]
A study has demonstrated that knowledge distillation and label smoothing can be unified as learning from soft labels.
This paper studies whether biased soft labels are still effective.
arXiv Detail & Related papers (2023-02-16T08:57:48Z) - Adaptive Label Smoothing with Self-Knowledge in Natural Language
Generation [16.878277421402945]
We propose a regularization scheme that brings dynamic nature into the smoothing parameter.
A model in training self-regulates the extent of smoothing on the fly during forward propagation.
arXiv Detail & Related papers (2022-10-22T11:52:38Z) - Is Label Smoothing Truly Incompatible with Knowledge Distillation: An
Empirical Study [59.95267695402516]
This work aims to empirically clarify that label smoothing is incompatible with knowledge distillation.
We provide a novel connection on how label smoothing affects distributions of semantically similar and dissimilar classes.
We study its one-sidedness and imperfection of the incompatibility view through massive analyses, visualizations and comprehensive experiments.
arXiv Detail & Related papers (2021-04-01T17:59:12Z) - Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance
Tradeoff Perspective [63.87421152879726]
We investigate the bias-variance tradeoff brought by distillation with soft labels.
We propose the novel weighted soft labels to help the network adaptively handle the sample-wise bias-variance tradeoff.
arXiv Detail & Related papers (2021-02-01T05:53:04Z) - Learning to Purify Noisy Labels via Meta Soft Label Corrector [49.92310583232323]
Recent deep neural networks (DNNs) can easily overfit to biased training data with noisy labels.
Label correction strategy is commonly used to alleviate this issue.
We propose a meta-learning model which could estimate soft labels through meta-gradient descent step.
arXiv Detail & Related papers (2020-08-03T03:25:17Z) - Does label smoothing mitigate label noise? [57.76529645344897]
We show that label smoothing is competitive with loss-correction under label noise.
We show that when distilling models from noisy data, label smoothing of the teacher is beneficial.
arXiv Detail & Related papers (2020-03-05T18:43:17Z) - Regularization via Structural Label Smoothing [22.74769739125912]
Regularization is an effective way to promote the generalization performance of machine learning models.
In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network.
We show that such label smoothing imposes a quantifiable bias in the Bayes error rate of the training data.
arXiv Detail & Related papers (2020-01-07T05:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.