Self-Knowledge Distillation with Progressive Refinement of Targets
- URL: http://arxiv.org/abs/2006.12000v3
- Date: Thu, 7 Oct 2021 13:09:26 GMT
- Title: Self-Knowledge Distillation with Progressive Refinement of Targets
- Authors: Kyungyul Kim, ByeongMoon Ji, Doyoung Yoon, Sangheum Hwang
- Abstract summary: We propose a simple yet effective regularization method named progressive self-knowledge distillation (PS-KD)
PS-KD progressively distills a model's own knowledge to soften hard targets during training.
We show that PS-KD provides an effect of hard example mining by rescaling gradients according to difficulty in classifying examples.
- Score: 1.1470070927586016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The generalization capability of deep neural networks has been substantially
improved by applying a wide spectrum of regularization methods, e.g.,
restricting function space, injecting randomness during training, augmenting
data, etc. In this work, we propose a simple yet effective regularization
method named progressive self-knowledge distillation (PS-KD), which
progressively distills a model's own knowledge to soften hard targets (i.e.,
one-hot vectors) during training. Hence, it can be interpreted within a
framework of knowledge distillation as a student becomes a teacher itself.
Specifically, targets are adjusted adaptively by combining the ground-truth and
past predictions from the model itself. We show that PS-KD provides an effect
of hard example mining by rescaling gradients according to difficulty in
classifying examples. The proposed method is applicable to any supervised
learning tasks with hard targets and can be easily combined with existing
regularization methods to further enhance the generalization performance.
Furthermore, it is confirmed that PS-KD achieves not only better accuracy, but
also provides high quality of confidence estimates in terms of calibration as
well as ordinal ranking. Extensive experimental results on three different
tasks, image classification, object detection, and machine translation,
demonstrate that our method consistently improves the performance of the
state-of-the-art baselines. The code is available at
https://github.com/lgcnsai/PS-KD-Pytorch.
Related papers
- Adaptive Explicit Knowledge Transfer for Knowledge Distillation [17.739979156009696]
We show that the performance of logit-based knowledge distillation can be improved by effectively delivering the probability distribution for the non-target classes from the teacher model.
We propose a new loss that enables the student to learn explicit knowledge along with implicit knowledge in an adaptive manner.
Experimental results demonstrate that the proposed method, called adaptive explicit knowledge transfer (AEKT) method, achieves improved performance compared to the state-of-the-art KD methods.
arXiv Detail & Related papers (2024-09-03T07:42:59Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Self-Knowledge Distillation via Dropout [0.7883397954991659]
We propose a simple and effective self-knowledge distillation method using a dropout (SD-Dropout)
Our method does not require any additional trainable modules, does not rely on data, and requires only simple operations.
arXiv Detail & Related papers (2022-08-11T05:08:55Z) - Robust and Accurate Object Detection via Self-Knowledge Distillation [9.508466066051572]
Unified Decoupled Feature Alignment (UDFA) is a novel fine-tuning paradigm which achieves better performance than existing methods.
We show that UDFA can surpass the standard training and state-of-the-art adversarial training methods for object detection.
arXiv Detail & Related papers (2021-11-14T04:40:15Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - An Adaptive Framework for Learning Unsupervised Depth Completion [59.17364202590475]
We present a method to infer a dense depth map from a color image and associated sparse depth measurements.
We show that regularization and co-visibility are related via the fitness of the model to data and can be unified into a single framework.
arXiv Detail & Related papers (2021-06-06T02:27:55Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Regularized Training and Tight Certification for Randomized Smoothed
Classifier with Provable Robustness [15.38718018477333]
We derive a new regularized risk, in which the regularizer can adaptively encourage the accuracy and robustness of the smoothed counterpart.
We also design a new certification algorithm, which can leverage the regularization effect to provide tighter robustness lower bound that holds with high probability.
arXiv Detail & Related papers (2020-02-17T20:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.