Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance
Tradeoff Perspective
- URL: http://arxiv.org/abs/2102.00650v1
- Date: Mon, 1 Feb 2021 05:53:04 GMT
- Title: Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance
Tradeoff Perspective
- Authors: Helong Zhou, Liangchen Song, Jiajie Chen, Ye Zhou, Guoli Wang, Junsong
Yuan, Qian Zhang
- Abstract summary: We investigate the bias-variance tradeoff brought by distillation with soft labels.
We propose the novel weighted soft labels to help the network adaptively handle the sample-wise bias-variance tradeoff.
- Score: 63.87421152879726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation is an effective approach to leverage a well-trained
network or an ensemble of them, named as the teacher, to guide the training of
a student network. The outputs from the teacher network are used as soft labels
for supervising the training of a new network. Recent studies
\citep{muller2019does,yuan2020revisiting} revealed an intriguing property of
the soft labels that making labels soft serves as a good regularization to the
student network. From the perspective of statistical learning, regularization
aims to reduce the variance, however how bias and variance change is not clear
for training with soft labels. In this paper, we investigate the bias-variance
tradeoff brought by distillation with soft labels. Specifically, we observe
that during training the bias-variance tradeoff varies sample-wisely. Further,
under the same distillation temperature setting, we observe that the
distillation performance is negatively associated with the number of some
specific samples, which are named as regularization samples since these samples
lead to bias increasing and variance decreasing. Nevertheless, we empirically
find that completely filtering out regularization samples also deteriorates
distillation performance. Our discoveries inspired us to propose the novel
weighted soft labels to help the network adaptively handle the sample-wise
bias-variance tradeoff. Experiments on standard evaluation benchmarks validate
the effectiveness of our method. Our code is available at
\url{https://github.com/bellymonster/Weighted-Soft-Label-Distillation}.
Related papers
- Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Learning From Biased Soft Labels [48.84637168570285]
A study has demonstrated that knowledge distillation and label smoothing can be unified as learning from soft labels.
This paper studies whether biased soft labels are still effective.
arXiv Detail & Related papers (2023-02-16T08:57:48Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Weighted Distillation with Unlabeled Examples [15.825078347452024]
Distillation with unlabeled examples is a popular and powerful method for training deep neural networks in settings where the amount of labeled data is limited.
This paper proposes a principled approach for addressing this issue based on a ''debiasing'' reweighting of the student's loss function tailored to the distillation training paradigm.
arXiv Detail & Related papers (2022-10-13T04:08:56Z) - A Theory-Driven Self-Labeling Refinement Method for Contrastive
Representation Learning [111.05365744744437]
Unsupervised contrastive learning labels crops of the same image as positives, and other image crops as negatives.
In this work, we first prove that for contrastive learning, inaccurate label assignment heavily impairs its generalization for semantic instance discrimination.
Inspired by this theory, we propose a novel self-labeling refinement approach for contrastive learning.
arXiv Detail & Related papers (2021-06-28T14:24:52Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Is Label Smoothing Truly Incompatible with Knowledge Distillation: An
Empirical Study [59.95267695402516]
This work aims to empirically clarify that label smoothing is incompatible with knowledge distillation.
We provide a novel connection on how label smoothing affects distributions of semantically similar and dissimilar classes.
We study its one-sidedness and imperfection of the incompatibility view through massive analyses, visualizations and comprehensive experiments.
arXiv Detail & Related papers (2021-04-01T17:59:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.