Fixing the Teacher-Student Knowledge Discrepancy in Distillation
- URL: http://arxiv.org/abs/2103.16844v1
- Date: Wed, 31 Mar 2021 06:52:20 GMT
- Title: Fixing the Teacher-Student Knowledge Discrepancy in Distillation
- Authors: Jiangfan Han, Mengya Gao, Yujie Wang, Quanquan Li, Hongsheng Li,
Xiaogang Wang
- Abstract summary: We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student.
Our method is very flexible that can be easily combined with other state-of-the-art approaches.
- Score: 72.4354883997316
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training a small student network with the guidance of a larger teacher
network is an effective way to promote the performance of the student. Despite
the different types, the guided knowledge used to distill is always kept
unchanged for different teacher and student pairs in previous knowledge
distillation methods. However, we find that teacher and student models with
different networks or trained from different initialization could have distinct
feature representations among different channels. (e.g. the high activated
channel for different categories). We name this incongruous representation of
channels as teacher-student knowledge discrepancy in the distillation process.
Ignoring the knowledge discrepancy problem of teacher and student models will
make the learning of student from teacher more difficult. To solve this
problem, in this paper, we propose a novel student-dependent distillation
method, knowledge consistent distillation, which makes teacher's knowledge more
consistent with the student and provides the best suitable knowledge to
different student networks for distillation. Extensive experiments on different
datasets (CIFAR100, ImageNet, COCO) and tasks (image classification, object
detection) reveal the widely existing knowledge discrepancy problem between
teachers and students and demonstrate the effectiveness of our proposed method.
Our method is very flexible that can be easily combined with other
state-of-the-art approaches.
Related papers
- Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation [11.754014876977422]
This paper introduces a novel perspective emphasizing student-oriented and refining the teacher's knowledge to better align with the student's needs.
We present the Student-Oriented Knowledge Distillation (SoKD), which incorporates a learnable feature augmentation strategy during training.
We also deploy the Distinctive Area Detection Module (DAM) to identify areas of mutual interest between the teacher and student.
arXiv Detail & Related papers (2024-09-27T14:34:08Z) - Unified and Effective Ensemble Knowledge Distillation [92.67156911466397]
Ensemble knowledge distillation can extract knowledge from multiple teacher models and encode it into a single student model.
Many existing methods learn and distill the student model on labeled data only.
We propose a unified and effective ensemble knowledge distillation method that distills a single student model from an ensemble of teacher models on both labeled and unlabeled data.
arXiv Detail & Related papers (2022-04-01T16:15:39Z) - Does Knowledge Distillation Really Work? [106.38447017262183]
We show that while knowledge distillation can improve student generalization, it does not typically work as it is commonly understood.
We identify difficulties in optimization as a key reason for why the student is unable to match the teacher.
arXiv Detail & Related papers (2021-06-10T17:44:02Z) - Student Network Learning via Evolutionary Knowledge Distillation [22.030934154498205]
We propose an evolutionary knowledge distillation approach to improve the transfer effectiveness of teacher knowledge.
Instead of a fixed pre-trained teacher, an evolutionary teacher is learned online and consistently transfers intermediate knowledge to supervise student network learning on-the-fly.
In this way, the student can simultaneously obtain rich internal knowledge and capture its growth process, leading to effective student network learning.
arXiv Detail & Related papers (2021-03-23T02:07:15Z) - Distilling Knowledge via Intermediate Classifier Heads [0.5584060970507505]
Knowledge distillation is a transfer-learning approach to train a resource-limited student model with the guide of a pre-trained larger teacher model.
We introduce knowledge distillation via intermediate heads to mitigate the impact of the capacity gap.
Our experiments on various teacher-student pairs and datasets have demonstrated that the proposed approach outperforms the canonical knowledge distillation approach.
arXiv Detail & Related papers (2021-02-28T12:52:52Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z) - Collaborative Teacher-Student Learning via Multiple Knowledge Transfer [79.45526596053728]
We propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT)
It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way.
The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
arXiv Detail & Related papers (2021-01-21T07:17:04Z) - Densely Guided Knowledge Distillation using Multiple Teacher Assistants [5.169724825219126]
We propose a densely guided knowledge distillation using multiple teacher assistants that gradually decreases the model size.
We also design teaching where, for each mini-batch, a teacher or teacher assistants are randomly dropped.
This acts as a regularizer to improve the efficiency of teaching of the student network.
arXiv Detail & Related papers (2020-09-18T13:12:52Z) - Interactive Knowledge Distillation [79.12866404907506]
We propose an InterActive Knowledge Distillation scheme to leverage the interactive teaching strategy for efficient knowledge distillation.
In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation.
Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods.
arXiv Detail & Related papers (2020-07-03T03:22:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.