A Survey on Recent Teacher-student Learning Studies
- URL: http://arxiv.org/abs/2304.04615v1
- Date: Mon, 10 Apr 2023 14:30:28 GMT
- Title: A Survey on Recent Teacher-student Learning Studies
- Authors: Minghong Gao
- Abstract summary: Knowledge distillation is a method of transferring the knowledge from a complex deep neural network (DNN) to a smaller and faster DNN.
Recent variants of knowledge distillation include teaching assistant distillation, curriculum distillation, mask distillation, and decoupling distillation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation is a method of transferring the knowledge from a
complex deep neural network (DNN) to a smaller and faster DNN, while preserving
its accuracy. Recent variants of knowledge distillation include teaching
assistant distillation, curriculum distillation, mask distillation, and
decoupling distillation, which aim to improve the performance of knowledge
distillation by introducing additional components or by changing the learning
process. Teaching assistant distillation involves an intermediate model called
the teaching assistant, while curriculum distillation follows a curriculum
similar to human education. Mask distillation focuses on transferring the
attention mechanism learned by the teacher, and decoupling distillation
decouples the distillation loss from the task loss. Overall, these variants of
knowledge distillation have shown promising results in improving the
performance of knowledge distillation.
Related papers
- Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation [52.53446712834569]
Learning Good Teacher Matters (LGTM) is an efficient training technique for incorporating distillation influence into the teacher's learning process.
Our LGTM outperforms 10 common knowledge distillation baselines on 6 text classification tasks in the GLUE benchmark.
arXiv Detail & Related papers (2023-05-16T17:50:09Z) - Learning the Wrong Lessons: Inserting Trojans During Knowledge
Distillation [68.8204255655161]
Trojan attacks have contemporaneously gained significant prominence, revealing fundamental vulnerabilities in deep learning models.
We seek to exploit the unlabelled data knowledge distillation process to embed Trojans in a student model without introducing conspicuous behavior in the teacher.
We devise a Trojan attack that effectively reduces student accuracy, does not alter teacher performance, and is efficiently constructible in practice.
arXiv Detail & Related papers (2023-03-09T21:37:50Z) - Supervision Complexity and its Role in Knowledge Distillation [65.07910515406209]
We study the generalization behavior of a distilled student.
The framework highlights a delicate interplay among the teacher's accuracy, the student's margin with respect to the teacher predictions, and the complexity of the teacher predictions.
We demonstrate efficacy of online distillation and validate the theoretical findings on a range of image classification benchmarks and model architectures.
arXiv Detail & Related papers (2023-01-28T16:34:47Z) - Class-aware Information for Logit-based Knowledge Distillation [16.634819319915923]
We propose a Class-aware Logit Knowledge Distillation (CLKD) method, that extents the logit distillation in both instance-level and class-level.
CLKD enables the student model mimic higher semantic information from the teacher model, hence improving the distillation performance.
arXiv Detail & Related papers (2022-11-27T09:27:50Z) - DETRDistill: A Universal Knowledge Distillation Framework for
DETR-families [11.9748352746424]
Transformer-based detectors (DETRs) have attracted great attention due to their sparse training paradigm and the removal of post-processing operations.
Knowledge distillation (KD) can be employed to compress the huge model by constructing a universal teacher-student learning framework.
arXiv Detail & Related papers (2022-11-17T13:35:11Z) - Spot-adaptive Knowledge Distillation [39.23627955442595]
We propose a new distillation strategy, termed spot-adaptive KD (SAKD)
SAKD adaptively determines the distillation spots in the teacher network per sample, at every training iteration during the whole distillation period.
Experiments with 10 state-of-the-art distillers are conducted to demonstrate the effectiveness of SAKD.
arXiv Detail & Related papers (2022-05-05T02:21:32Z) - Controlling the Quality of Distillation in Response-Based Network
Compression [0.0]
The performance of a compressed network is governed by the quality of distillation.
For a given teacher-student pair, the quality of distillation can be improved by finding the sweet spot between batch size and number of epochs while training the teacher.
arXiv Detail & Related papers (2021-12-19T02:53:51Z) - Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student.
Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z) - Collaborative Teacher-Student Learning via Multiple Knowledge Transfer [79.45526596053728]
We propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT)
It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way.
The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
arXiv Detail & Related papers (2021-01-21T07:17:04Z) - A Selective Survey on Versatile Knowledge Distillation Paradigm for
Neural Network Models [3.770437296936382]
We review the characteristics of knowledge distillation from the hypothesis that the three important ingredients of knowledge distillation are distilled knowledge and loss, teacher-student paradigm, and the distillation process.
We present some future works in knowledge distillation including explainable knowledge distillation where the analytical analysis of the performance gain is studied and the self-supervised learning which is a hot research topic in deep learning community.
arXiv Detail & Related papers (2020-11-30T05:22:02Z) - Why distillation helps: a statistical perspective [69.90148901064747]
Knowledge distillation is a technique for improving the performance of a simple "student" model.
While this simple approach has proven widely effective, a basic question remains unresolved: why does distillation help?
We show how distillation complements existing negative mining techniques for extreme multiclass retrieval.
arXiv Detail & Related papers (2020-05-21T01:49:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.