Related papers: Learning Student-Friendly Teacher Networks for Knowledge Distillation

Learning Student-Friendly Teacher Networks for Knowledge Distillation

URL: http://arxiv.org/abs/2102.07650v2
Date: Tue, 16 Feb 2021 13:09:53 GMT
Title: Learning Student-Friendly Teacher Networks for Knowledge Distillation
Authors: Dae Young Park, Moon-Hyun Cha, Changwook Jeong, Daesin Kim, Bohyung Han
Abstract summary: We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
Score: 50.11640959363315
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students and, consequently, more appropriate for knowledge transfer. In other words, even at the time of optimizing a teacher model, the proposed algorithm learns the student branches jointly to obtain student-friendly representations. Since the main goal of our approach lies in training teacher models and the subsequent knowledge distillation procedure is straightforward, most of the existing knowledge distillation algorithms can adopt this technique to improve the performance of the student models in terms of accuracy and convergence speed. The proposed algorithm demonstrates outstanding accuracy in several well-known knowledge distillation techniques with various combinations of teacher and student architectures.

Related papers

Knowledge Distillation with Training Wheels [15.153745235245287]
We formulate a more general framework for knowledge distillation where the student learns from the teacher during training. We extend this using constrained reinforcement learning to a framework that incorporates the use of the teacher model as a test-time reference.
arXiv Detail & Related papers (2025-02-24T23:17:52Z)
Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation [11.754014876977422]
This paper introduces a novel perspective emphasizing student-oriented and refining the teacher's knowledge to better align with the student's needs. We present the Student-Oriented Knowledge Distillation (SoKD), which incorporates a learnable feature augmentation strategy during training. We also deploy the Distinctive Area Detection Module (DAM) to identify areas of mutual interest between the teacher and student.
arXiv Detail & Related papers (2024-09-27T14:34:08Z)
Learning Knowledge Representation with Meta Knowledge Distillation for Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task. Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z)
Dynamic Rectification Knowledge Distillation [0.0]
Dynamic Rectification Knowledge Distillation (DR-KD) is a knowledge distillation framework. DR-KD transforms the student into its own teacher, and if the self-teacher makes wrong predictions while distilling information, the error is rectified prior to the knowledge being distilled. Our proposed DR-KD performs remarkably well in the absence of a sophisticated cumbersome teacher model.
arXiv Detail & Related papers (2022-01-27T04:38:01Z)
Improved Knowledge Distillation via Adversarial Collaboration [2.373824287636486]
Small student model is trained to exploit the knowledge of a large well-trained teacher model. Due to the capacity gap between the teacher and the student, the student's performance is hard to reach the level of the teacher. We propose an Adversarial Collaborative Knowledge Distillation (ACKD) method that effectively improves the performance of knowledge distillation.
arXiv Detail & Related papers (2021-11-29T07:20:46Z)
Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency. We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z)
Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student. Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z)
Student Network Learning via Evolutionary Knowledge Distillation [22.030934154498205]
We propose an evolutionary knowledge distillation approach to improve the transfer effectiveness of teacher knowledge. Instead of a fixed pre-trained teacher, an evolutionary teacher is learned online and consistently transfers intermediate knowledge to supervise student network learning on-the-fly. In this way, the student can simultaneously obtain rich internal knowledge and capture its growth process, leading to effective student network learning.
arXiv Detail & Related papers (2021-03-23T02:07:15Z)
Interactive Knowledge Distillation [79.12866404907506]
We propose an InterActive Knowledge Distillation scheme to leverage the interactive teaching strategy for efficient knowledge distillation. In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation. Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods.
arXiv Detail & Related papers (2020-07-03T03:22:04Z)
Dual Policy Distillation [58.43610940026261]
Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning. In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment. The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-06-07T06:49:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.