Related papers: Cross-View Consistency Regularisation for Knowledge Distillation

Cross-View Consistency Regularisation for Knowledge Distillation

URL: http://arxiv.org/abs/2412.16493v1
Date: Sat, 21 Dec 2024 05:41:47 GMT
Title: Cross-View Consistency Regularisation for Knowledge Distillation
Authors: Weijia Zhang, Dongnan Liu, Weidong Cai, Chao Ma,
Abstract summary: This work is inspired by the success of cross-view learning in fields such as semi-supervised learning.<n>We introduce within-view and cross-view regularisations to standard logit-based distillation frameworks.<n>We also perform confidence-based soft label mining to improve the quality of distilling signals from the teacher.
Score: 13.918476599394603
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge distillation (KD) is an established paradigm for transferring privileged knowledge from a cumbersome model to a lightweight and efficient one. In recent years, logit-based KD methods are quickly catching up in performance with their feature-based counterparts. However, previous research has pointed out that logit-based methods are still fundamentally limited by two major issues in their training process, namely overconfident teacher and confirmation bias. Inspired by the success of cross-view learning in fields such as semi-supervised learning, in this work we introduce within-view and cross-view regularisations to standard logit-based distillation frameworks to combat the above cruxes. We also perform confidence-based soft label mining to improve the quality of distilling signals from the teacher, which further mitigates the confirmation bias problem. Despite its apparent simplicity, the proposed Consistency-Regularisation-based Logit Distillation (CRLD) significantly boosts student learning, setting new state-of-the-art results on the standard CIFAR-100, Tiny-ImageNet, and ImageNet datasets across a diversity of teacher and student architectures, whilst introducing no extra network parameters. Orthogonal to on-going logit-based distillation research, our method enjoys excellent generalisation properties and, without bells and whistles, boosts the performance of various existing approaches by considerable margins.

Related papers

FiGKD: Fine-Grained Knowledge Distillation via High-Frequency Detail Transfer [0.0]
Fine-Grained Knowledge Distillation (FiGKD) is a frequency-aware framework that decomposes a model's logits into low-frequency (content) and high-frequency (detail) components.<n>FiGKD consistently outperforms state-of-the-art logit-based and feature-based distillation methods across a variety of teacher-student configurations.
arXiv Detail & Related papers (2025-05-17T08:27:02Z)
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Faithful Label-free Knowledge Distillation [8.572967695281054]
This paper presents a label-free knowledge distillation approach called Teacher in the Middle (TinTeM) It produces a more faithful student, which better replicates the behavior of the teacher network across a range of benchmarks testing model robustness, generalisability and out-of-distribution detection.
arXiv Detail & Related papers (2024-11-22T01:48:44Z)
Adaptive Explicit Knowledge Transfer for Knowledge Distillation [17.739979156009696]
We show that the performance of logit-based knowledge distillation can be improved by effectively delivering the probability distribution for the non-target classes from the teacher model. We propose a new loss that enables the student to learn explicit knowledge along with implicit knowledge in an adaptive manner. Experimental results demonstrate that the proposed method, called adaptive explicit knowledge transfer (AEKT) method, achieves improved performance compared to the state-of-the-art KD methods.
arXiv Detail & Related papers (2024-09-03T07:42:59Z)
LumiNet: The Bright Side of Perceptual Knowledge Distillation [18.126581058419713]
We present LumiNet, a novel knowledge distillation algorithm designed to enhance logit-based distillation. LumiNet addresses overconfidence issues in logit-based distillation method while also introducing a novel method to distill knowledge from the teacher. It excels on benchmarks like CIFAR-100, ImageNet, and MSCOCO, outperforming leading feature-based methods.
arXiv Detail & Related papers (2023-10-05T16:43:28Z)
Faithful Knowledge Distillation [75.59907631395849]
We focus on two crucial questions with regard to a teacher-student pair: (i) do the teacher and student disagree at points close to correctly classified dataset examples, and (ii) is the distilled student as confident as the teacher around dataset examples? These are critical questions when considering the deployment of a smaller student network trained from a robust teacher within a safety-critical setting.
arXiv Detail & Related papers (2023-06-07T13:41:55Z)
Class-aware Information for Logit-based Knowledge Distillation [16.634819319915923]
We propose a Class-aware Logit Knowledge Distillation (CLKD) method, that extents the logit distillation in both instance-level and class-level. CLKD enables the student model mimic higher semantic information from the teacher model, hence improving the distillation performance.
arXiv Detail & Related papers (2022-11-27T09:27:50Z)
Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student. At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL) Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z)
On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness. We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z)
Class-Balanced Distillation for Long-Tailed Visual Recognition [100.10293372607222]
Real-world imagery is often characterized by a significant imbalance of the number of images per class, leading to long-tailed distributions. In this work, we introduce a new framework, by making the key observation that a feature representation learned with instance sampling is far from optimal in a long-tailed setting. Our main contribution is a new training method, that leverages knowledge distillation to enhance feature representations.
arXiv Detail & Related papers (2021-04-12T08:21:03Z)
Self-Knowledge Distillation with Progressive Refinement of Targets [1.1470070927586016]
We propose a simple yet effective regularization method named progressive self-knowledge distillation (PS-KD) PS-KD progressively distills a model's own knowledge to soften hard targets during training. We show that PS-KD provides an effect of hard example mining by rescaling gradients according to difficulty in classifying examples.
arXiv Detail & Related papers (2020-06-22T04:06:36Z)
Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network. We show that the seemingly different self-supervision task can serve as a simple yet powerful solution. By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.