Multi-granularity for knowledge distillation
- URL: http://arxiv.org/abs/2108.06681v1
- Date: Sun, 15 Aug 2021 07:47:08 GMT
- Title: Multi-granularity for knowledge distillation
- Authors: Baitan Shao, Ying Chen
- Abstract summary: Students have different abilities to understand the knowledge imparted by teachers.
A multi-granularity self-analyzing module of the teacher network is designed.
A stable excitation scheme is proposed for robust supervision for the student training.
- Score: 3.3970049571884204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Considering the fact that students have different abilities to understand the
knowledge imparted by teachers, a multi-granularity distillation mechanism is
proposed for transferring more understandable knowledge for student networks. A
multi-granularity self-analyzing module of the teacher network is designed,
which enables the student network to learn knowledge from different teaching
patterns. Furthermore, a stable excitation scheme is proposed for robust
supervision for the student training. The proposed distillation mechanism can
be embedded into different distillation frameworks, which are taken as
baselines. Experiments show the mechanism improves the accuracy by 0.58% on
average and by 1.08% in the best over the baselines, which makes its
performance superior to the state-of-the-arts. It is also exploited that the
student's ability of fine-tuning and robustness to noisy inputs can be improved
via the proposed mechanism. The code is available at
https://github.com/shaoeric/multi-granularity-distillation.
Related papers
- Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification [0.0]
This paper proposes an uncertainty-aware dual-student knowledge distillation framework.<n>We introduce a peer-learning mechanism where two heterogeneous student architectures learn collaboratively from both the teacher network and each other.
arXiv Detail & Related papers (2025-11-24T07:02:22Z) - Balance Divergence for Knowledge Distillation [5.971722196386694]
Most existing knowledge distillation methods utilize Kullback-Leibler divergence to mimic the logit output probabilities between the teacher network and the student network.
This deficiency may lead to suboptimal performance in logit mimicry during the distillation process.
In this paper, we propose a novel method, named Balance Divergence Distillation.
arXiv Detail & Related papers (2025-01-14T03:12:25Z) - Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z) - Supervision Complexity and its Role in Knowledge Distillation [65.07910515406209]
We study the generalization behavior of a distilled student.
The framework highlights a delicate interplay among the teacher's accuracy, the student's margin with respect to the teacher predictions, and the complexity of the teacher predictions.
We demonstrate efficacy of online distillation and validate the theoretical findings on a range of image classification benchmarks and model architectures.
arXiv Detail & Related papers (2023-01-28T16:34:47Z) - Learning Knowledge Representation with Meta Knowledge Distillation for
Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task.
Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z) - Extracting knowledge from features with multilevel abstraction [3.4443503349903124]
Self-knowledge distillation (SKD) aims at transferring the knowledge from a large teacher model to a small student model.
In this paper, we purpose a novel SKD method in a different way from the main stream methods.
Experiments and ablation studies show its great effectiveness and generalization on various kinds of tasks.
arXiv Detail & Related papers (2021-12-04T02:25:46Z) - Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student.
Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z) - Student Network Learning via Evolutionary Knowledge Distillation [22.030934154498205]
We propose an evolutionary knowledge distillation approach to improve the transfer effectiveness of teacher knowledge.
Instead of a fixed pre-trained teacher, an evolutionary teacher is learned online and consistently transfers intermediate knowledge to supervise student network learning on-the-fly.
In this way, the student can simultaneously obtain rich internal knowledge and capture its growth process, leading to effective student network learning.
arXiv Detail & Related papers (2021-03-23T02:07:15Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.