Related papers: Multi-granularity for knowledge distillation

Multi-granularity for knowledge distillation

URL: http://arxiv.org/abs/2108.06681v1
Date: Sun, 15 Aug 2021 07:47:08 GMT
Title: Multi-granularity for knowledge distillation
Authors: Baitan Shao, Ying Chen
Abstract summary: Students have different abilities to understand the knowledge imparted by teachers. A multi-granularity self-analyzing module of the teacher network is designed. A stable excitation scheme is proposed for robust supervision for the student training.
Score: 3.3970049571884204
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Considering the fact that students have different abilities to understand the knowledge imparted by teachers, a multi-granularity distillation mechanism is proposed for transferring more understandable knowledge for student networks. A multi-granularity self-analyzing module of the teacher network is designed, which enables the student network to learn knowledge from different teaching patterns. Furthermore, a stable excitation scheme is proposed for robust supervision for the student training. The proposed distillation mechanism can be embedded into different distillation frameworks, which are taken as baselines. Experiments show the mechanism improves the accuracy by 0.58% on average and by 1.08% in the best over the baselines, which makes its performance superior to the state-of-the-arts. It is also exploited that the student's ability of fine-tuning and robustness to noisy inputs can be improved via the proposed mechanism. The code is available at https://github.com/shaoeric/multi-granularity-distillation.

Related papers

Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification [0.0]
This paper proposes an uncertainty-aware dual-student knowledge distillation framework.<n>We introduce a peer-learning mechanism where two heterogeneous student architectures learn collaboratively from both the teacher network and each other.
arXiv Detail & Related papers (2025-11-24T07:02:22Z)
Balance Divergence for Knowledge Distillation [5.971722196386694]
Most existing knowledge distillation methods utilize Kullback-Leibler divergence to mimic the logit output probabilities between the teacher network and the student network. This deficiency may lead to suboptimal performance in logit mimicry during the distillation process. In this paper, we propose a novel method, named Balance Divergence Distillation.
arXiv Detail & Related papers (2025-01-14T03:12:25Z)
Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student. To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z)
Supervision Complexity and its Role in Knowledge Distillation [65.07910515406209]
We study the generalization behavior of a distilled student. The framework highlights a delicate interplay among the teacher's accuracy, the student's margin with respect to the teacher predictions, and the complexity of the teacher predictions. We demonstrate efficacy of online distillation and validate the theoretical findings on a range of image classification benchmarks and model architectures.
arXiv Detail & Related papers (2023-01-28T16:34:47Z)
Learning Knowledge Representation with Meta Knowledge Distillation for Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task. Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z)
Extracting knowledge from features with multilevel abstraction [3.4443503349903124]
Self-knowledge distillation (SKD) aims at transferring the knowledge from a large teacher model to a small student model. In this paper, we purpose a novel SKD method in a different way from the main stream methods. Experiments and ablation studies show its great effectiveness and generalization on various kinds of tasks.
arXiv Detail & Related papers (2021-12-04T02:25:46Z)
Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student. Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z)
Student Network Learning via Evolutionary Knowledge Distillation [22.030934154498205]
We propose an evolutionary knowledge distillation approach to improve the transfer effectiveness of teacher knowledge. Instead of a fixed pre-trained teacher, an evolutionary teacher is learned online and consistently transfers intermediate knowledge to supervise student network learning on-the-fly. In this way, the student can simultaneously obtain rich internal knowledge and capture its growth process, leading to effective student network learning.
arXiv Detail & Related papers (2021-03-23T02:07:15Z)
Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z)
Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network. We show that the seemingly different self-supervision task can serve as a simple yet powerful solution. By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.