Student Network Learning via Evolutionary Knowledge Distillation
- URL: http://arxiv.org/abs/2103.13811v1
- Date: Tue, 23 Mar 2021 02:07:15 GMT
- Title: Student Network Learning via Evolutionary Knowledge Distillation
- Authors: Kangkai Zhang, Chunhui Zhang, Shikun Li, Dan Zeng, Shiming Ge
- Abstract summary: We propose an evolutionary knowledge distillation approach to improve the transfer effectiveness of teacher knowledge.
Instead of a fixed pre-trained teacher, an evolutionary teacher is learned online and consistently transfers intermediate knowledge to supervise student network learning on-the-fly.
In this way, the student can simultaneously obtain rich internal knowledge and capture its growth process, leading to effective student network learning.
- Score: 22.030934154498205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation provides an effective way to transfer knowledge via
teacher-student learning, where most existing distillation approaches apply a
fixed pre-trained model as teacher to supervise the learning of student
network. This manner usually brings in a big capability gap between teacher and
student networks during learning. Recent researches have observed that a small
teacher-student capability gap can facilitate knowledge transfer. Inspired by
that, we propose an evolutionary knowledge distillation approach to improve the
transfer effectiveness of teacher knowledge. Instead of a fixed pre-trained
teacher, an evolutionary teacher is learned online and consistently transfers
intermediate knowledge to supervise student network learning on-the-fly. To
enhance intermediate knowledge representation and mimicking, several simple
guided modules are introduced between corresponding teacher-student blocks. In
this way, the student can simultaneously obtain rich internal knowledge and
capture its growth process, leading to effective student network learning.
Extensive experiments clearly demonstrate the effectiveness of our approach as
well as good adaptability in the low-resolution and few-sample visual
recognition scenarios.
Related papers
- Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation [11.754014876977422]
This paper introduces a novel perspective emphasizing student-oriented and refining the teacher's knowledge to better align with the student's needs.
We present the Student-Oriented Knowledge Distillation (SoKD), which incorporates a learnable feature augmentation strategy during training.
We also deploy the Distinctive Area Detection Module (DAM) to identify areas of mutual interest between the teacher and student.
arXiv Detail & Related papers (2024-09-27T14:34:08Z) - Improving Knowledge Distillation via Transferring Learning Ability [15.62306809592042]
Existing knowledge distillation methods generally use a teacher-student approach, where the student network solely learns from a well-trained teacher.
This approach overlooks the inherent differences in learning abilities between the teacher and student networks, thus causing the capacity-gap problem.
We propose a novel method called SLKD to address this limitation.
arXiv Detail & Related papers (2023-04-24T09:06:06Z) - Generalized Knowledge Distillation via Relationship Matching [53.69235109551099]
Knowledge of a well-trained deep neural network (a.k.a. the "teacher") is valuable for learning similar tasks.
Knowledge distillation extracts knowledge from the teacher and integrates it with the target model.
Instead of enforcing the teacher to work on the same task as the student, we borrow the knowledge from a teacher trained from a general label space.
arXiv Detail & Related papers (2022-05-04T06:49:47Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student.
Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z) - Distilling Knowledge via Intermediate Classifier Heads [0.5584060970507505]
Knowledge distillation is a transfer-learning approach to train a resource-limited student model with the guide of a pre-trained larger teacher model.
We introduce knowledge distillation via intermediate heads to mitigate the impact of the capacity gap.
Our experiments on various teacher-student pairs and datasets have demonstrated that the proposed approach outperforms the canonical knowledge distillation approach.
arXiv Detail & Related papers (2021-02-28T12:52:52Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z) - Point Adversarial Self Mining: A Simple Method for Facial Expression
Recognition [79.75964372862279]
We propose Point Adversarial Self Mining (PASM) to improve the recognition accuracy in facial expression recognition.
PASM uses a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task.
The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively.
arXiv Detail & Related papers (2020-08-26T06:39:24Z) - Interactive Knowledge Distillation [79.12866404907506]
We propose an InterActive Knowledge Distillation scheme to leverage the interactive teaching strategy for efficient knowledge distillation.
In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation.
Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods.
arXiv Detail & Related papers (2020-07-03T03:22:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.