Related papers: Self-Knowledge Distillation with Progressive Refinement of Targets

Self-Knowledge Distillation with Progressive Refinement of Targets

URL: http://arxiv.org/abs/2006.12000v3
Date: Thu, 7 Oct 2021 13:09:26 GMT
Title: Self-Knowledge Distillation with Progressive Refinement of Targets
Authors: Kyungyul Kim, ByeongMoon Ji, Doyoung Yoon, Sangheum Hwang
Abstract summary: We propose a simple yet effective regularization method named progressive self-knowledge distillation (PS-KD) PS-KD progressively distills a model's own knowledge to soften hard targets during training. We show that PS-KD provides an effect of hard example mining by rescaling gradients according to difficulty in classifying examples.
Score: 1.1470070927586016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The generalization capability of deep neural networks has been substantially improved by applying a wide spectrum of regularization methods, e.g., restricting function space, injecting randomness during training, augmenting data, etc. In this work, we propose a simple yet effective regularization method named progressive self-knowledge distillation (PS-KD), which progressively distills a model's own knowledge to soften hard targets (i.e., one-hot vectors) during training. Hence, it can be interpreted within a framework of knowledge distillation as a student becomes a teacher itself. Specifically, targets are adjusted adaptively by combining the ground-truth and past predictions from the model itself. We show that PS-KD provides an effect of hard example mining by rescaling gradients according to difficulty in classifying examples. The proposed method is applicable to any supervised learning tasks with hard targets and can be easily combined with existing regularization methods to further enhance the generalization performance. Furthermore, it is confirmed that PS-KD achieves not only better accuracy, but also provides high quality of confidence estimates in terms of calibration as well as ordinal ranking. Extensive experimental results on three different tasks, image classification, object detection, and machine translation, demonstrate that our method consistently improves the performance of the state-of-the-art baselines. The code is available at https://github.com/lgcnsai/PS-KD-Pytorch.

Related papers

EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z)
Enhancing Training Data Attribution with Representational Optimization [57.61977909113113]
Training data attribution methods aim to measure how training data impacts a model's predictions.<n>We propose AirRep, a representation-based approach that closes this gap by learning task-specific and model-aligned representations explicitly for TDA.<n>AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence.
arXiv Detail & Related papers (2025-05-24T05:17:53Z)
Intra-class Patch Swap for Self-Distillation [3.282914142012984]
We propose a teacher-free distillation framework based on a single student network.<n>Our approach is built on a simple yet highly effective augmentation, called intra-class patch swap augmentation.<n>Our method consistently outperforms both existing self-distillation baselines and conventional teacher-based KD approaches.
arXiv Detail & Related papers (2025-05-20T09:30:19Z)
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Cross-View Consistency Regularisation for Knowledge Distillation [13.918476599394603]
This work is inspired by the success of cross-view learning in fields such as semi-supervised learning. We introduce within-view and cross-view regularisations to standard logit-based distillation frameworks. We also perform confidence-based soft label mining to improve the quality of distilling signals from the teacher.
arXiv Detail & Related papers (2024-12-21T05:41:47Z)
Adaptive Explicit Knowledge Transfer for Knowledge Distillation [17.739979156009696]
We show that the performance of logit-based knowledge distillation can be improved by effectively delivering the probability distribution for the non-target classes from the teacher model. We propose a new loss that enables the student to learn explicit knowledge along with implicit knowledge in an adaptive manner. Experimental results demonstrate that the proposed method, called adaptive explicit knowledge transfer (AEKT) method, achieves improved performance compared to the state-of-the-art KD methods.
arXiv Detail & Related papers (2024-09-03T07:42:59Z)
Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class. Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z)
Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance. We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z)
Unifying Synergies between Self-supervised Learning and Dynamic Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms. We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z)
Self-Knowledge Distillation via Dropout [0.7883397954991659]
We propose a simple and effective self-knowledge distillation method using a dropout (SD-Dropout) Our method does not require any additional trainable modules, does not rely on data, and requires only simple operations.
arXiv Detail & Related papers (2022-08-11T05:08:55Z)
Robust and Accurate Object Detection via Self-Knowledge Distillation [9.508466066051572]
Unified Decoupled Feature Alignment (UDFA) is a novel fine-tuning paradigm which achieves better performance than existing methods. We show that UDFA can surpass the standard training and state-of-the-art adversarial training methods for object detection.
arXiv Detail & Related papers (2021-11-14T04:40:15Z)
Efficient training of lightweight neural networks using Online Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner. We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z)
An Adaptive Framework for Learning Unsupervised Depth Completion [59.17364202590475]
We present a method to infer a dense depth map from a color image and associated sparse depth measurements. We show that regularization and co-visibility are related via the fitness of the model to data and can be unified into a single framework.
arXiv Detail & Related papers (2021-06-06T02:27:55Z)
Deep Semi-supervised Knowledge Distillation for Overlapping Cervical Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation. We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining. Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z)
Regularized Training and Tight Certification for Randomized Smoothed Classifier with Provable Robustness [15.38718018477333]
We derive a new regularized risk, in which the regularizer can adaptively encourage the accuracy and robustness of the smoothed counterpart. We also design a new certification algorithm, which can leverage the regularization effect to provide tighter robustness lower bound that holds with high probability.
arXiv Detail & Related papers (2020-02-17T20:54:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.