Is Label Smoothing Truly Incompatible with Knowledge Distillation: An
Empirical Study
- URL: http://arxiv.org/abs/2104.00676v1
- Date: Thu, 1 Apr 2021 17:59:12 GMT
- Title: Is Label Smoothing Truly Incompatible with Knowledge Distillation: An
Empirical Study
- Authors: Zhiqiang Shen and Zechun Liu and Dejia Xu and Zitian Chen and
Kwang-Ting Cheng and Marios Savvides
- Abstract summary: This work aims to empirically clarify that label smoothing is incompatible with knowledge distillation.
We provide a novel connection on how label smoothing affects distributions of semantically similar and dissimilar classes.
We study its one-sidedness and imperfection of the incompatibility view through massive analyses, visualizations and comprehensive experiments.
- Score: 59.95267695402516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work aims to empirically clarify a recently discovered perspective that
label smoothing is incompatible with knowledge distillation. We begin by
introducing the motivation behind on how this incompatibility is raised, i.e.,
label smoothing erases relative information between teacher logits. We provide
a novel connection on how label smoothing affects distributions of semantically
similar and dissimilar classes. Then we propose a metric to quantitatively
measure the degree of erased information in sample's representation. After
that, we study its one-sidedness and imperfection of the incompatibility view
through massive analyses, visualizations and comprehensive experiments on Image
Classification, Binary Networks, and Neural Machine Translation. Finally, we
broadly discuss several circumstances wherein label smoothing will indeed lose
its effectiveness. Project page:
http://zhiqiangshen.com/projects/LS_and_KD/index.html.
Related papers
- Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - Learning From Biased Soft Labels [48.84637168570285]
A study has demonstrated that knowledge distillation and label smoothing can be unified as learning from soft labels.
This paper studies whether biased soft labels are still effective.
arXiv Detail & Related papers (2023-02-16T08:57:48Z) - A Theory-Driven Self-Labeling Refinement Method for Contrastive
Representation Learning [111.05365744744437]
Unsupervised contrastive learning labels crops of the same image as positives, and other image crops as negatives.
In this work, we first prove that for contrastive learning, inaccurate label assignment heavily impairs its generalization for semantic instance discrimination.
Inspired by this theory, we propose a novel self-labeling refinement approach for contrastive learning.
arXiv Detail & Related papers (2021-06-28T14:24:52Z) - Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance
Tradeoff Perspective [63.87421152879726]
We investigate the bias-variance tradeoff brought by distillation with soft labels.
We propose the novel weighted soft labels to help the network adaptively handle the sample-wise bias-variance tradeoff.
arXiv Detail & Related papers (2021-02-01T05:53:04Z) - An Investigation of how Label Smoothing Affects Generalization [22.663974656813824]
We show how label smoothing provides in controlling the generalization loss.
Our theory also predicts the existence of an optimal label smoothing point.
Our findings will help both theoreticians and practitioners understand label smoothing.
arXiv Detail & Related papers (2020-10-23T20:26:25Z) - Debiased Contrastive Learning [64.98602526764599]
We develop a debiased contrastive objective that corrects for the sampling of same-label datapoints.
Empirically, the proposed objective consistently outperforms the state-of-the-art for representation learning in vision, language, and reinforcement learning benchmarks.
arXiv Detail & Related papers (2020-07-01T04:25:24Z) - Does label smoothing mitigate label noise? [57.76529645344897]
We show that label smoothing is competitive with loss-correction under label noise.
We show that when distilling models from noisy data, label smoothing of the teacher is beneficial.
arXiv Detail & Related papers (2020-03-05T18:43:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.