Knowledge Distillation and Student-Teacher Learning for Visual
Intelligence: A Review and New Outlooks
- URL: http://arxiv.org/abs/2004.05937v7
- Date: Thu, 17 Jun 2021 07:17:50 GMT
- Title: Knowledge Distillation and Student-Teacher Learning for Visual
Intelligence: A Review and New Outlooks
- Authors: Lin Wang and Kuk-Jin Yoon
- Abstract summary: Knowledge distillation (KD) has been proposed to transfer information learned from one model to another.
This paper is about KD and S-T learning, which are being actively studied in recent years.
- Score: 39.2907363775529
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural models in recent years have been successful in almost every
field, including extremely complex problem statements. However, these models
are huge in size, with millions (and even billions) of parameters, thus
demanding more heavy computation power and failing to be deployed on edge
devices. Besides, the performance boost is highly dependent on redundant
labeled data. To achieve faster speeds and to handle the problems caused by the
lack of data, knowledge distillation (KD) has been proposed to transfer
information learned from one model to another. KD is often characterized by the
so-called `Student-Teacher' (S-T) learning framework and has been broadly
applied in model compression and knowledge transfer. This paper is about KD and
S-T learning, which are being actively studied in recent years. First, we aim
to provide explanations of what KD is and how/why it works. Then, we provide a
comprehensive survey on the recent progress of KD methods together with S-T
frameworks typically for vision tasks. In general, we consider some fundamental
questions that have been driving this research area and thoroughly generalize
the research progress and technical details. Additionally, we systematically
analyze the research status of KD in vision applications. Finally, we discuss
the potentials and open challenges of existing methods and prospect the future
directions of KD and S-T learning.
Related papers
- Applications of Knowledge Distillation in Remote Sensing: A Survey [3.481234252899159]
Knowledge distillation (KD) is a technique developed to transfer knowledge from a complex, often cumbersome model (teacher) to a more compact and efficient model (student)
The article provides a comprehensive taxonomy of KD techniques, where each category is critically analyzed to demonstrate the breadth and depth of the alternative options.
The review discusses the challenges and limitations of KD in RS, including practical constraints and prospective future directions.
arXiv Detail & Related papers (2024-09-18T16:30:49Z) - A Question-centric Multi-experts Contrastive Learning Framework for Improving the Accuracy and Interpretability of Deep Sequential Knowledge Tracing Models [26.294808618068146]
Knowledge tracing plays a crucial role in predicting students' future performance.
Deep neural networks (DNNs) have shown great potential in solving the KT problem.
However, there still exist some important challenges when applying deep learning techniques to model the KT process.
arXiv Detail & Related papers (2024-03-12T05:15:42Z) - A Survey on Knowledge Distillation of Large Language Models [99.11900233108487]
Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities to open-source models.
This paper presents a comprehensive survey of KD's role within the realm of Large Language Models (LLMs)
arXiv Detail & Related papers (2024-02-20T16:17:37Z) - Talking Models: Distill Pre-trained Knowledge to Downstream Models via
Interactive Communication [25.653517213641575]
We develop an interactive communication process to help students of downstream tasks learn effectively from pre-trained foundation models.
Our design is inspired by the way humans learn from teachers who can explain knowledge in a way that meets the students' needs.
arXiv Detail & Related papers (2023-10-04T22:22:21Z) - Categories of Response-Based, Feature-Based, and Relation-Based
Knowledge Distillation [10.899753512019933]
Knowledge Distillation (KD) aims to optimize a lightweight network.
KD mainly involves knowledge extraction and distillation strategies.
This paper provides a comprehensive KD survey, including knowledge categories, distillation schemes and algorithms.
arXiv Detail & Related papers (2023-06-19T03:42:44Z) - A Systematic Study of Knowledge Distillation for Natural Language
Generation with Pseudo-Target Training [32.87731973236423]
We focus on Knowledge Distillation (KD) techniques, in which a small student model learns to imitate a large teacher model.
We conduct a systematic study of task-specific KD techniques for various NLG tasks under realistic assumptions.
We propose the Joint-Teaching method, which applies word-level KD to multiple PTs generated by both the teacher and the student.
arXiv Detail & Related papers (2023-05-03T10:49:38Z) - Knowledge Distillation of Transformer-based Language Models Revisited [74.25427636413067]
Large model size and high run-time latency are serious impediments to applying pre-trained language models in practice.
We propose a unified knowledge distillation framework for transformer-based models.
Our empirical results shed light on the distillation in the pre-train language model and with relative significant improvement over previous state-of-the-arts(SOTA)
arXiv Detail & Related papers (2022-06-29T02:16:56Z) - How and When Adversarial Robustness Transfers in Knowledge Distillation? [137.11016173468457]
This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in Knowledge distillation (KD)
We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy.
Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model.
arXiv Detail & Related papers (2021-10-22T21:30:53Z) - KDExplainer: A Task-oriented Attention Model for Explaining Knowledge
Distillation [59.061835562314066]
We introduce a novel task-oriented attention model, termed as KDExplainer, to shed light on the working mechanism underlying the vanilla KD.
We also introduce a portable tool, dubbed as virtual attention module (VAM), that can be seamlessly integrated with various deep neural networks (DNNs) to enhance their performance under KD.
arXiv Detail & Related papers (2021-05-10T08:15:26Z) - A Survey of Knowledge Tracing: Models, Variants, and Applications [70.69281873057619]
Knowledge Tracing is one of the fundamental tasks for student behavioral data analysis.
We present three types of fundamental KT models with distinct technical routes.
We discuss potential directions for future research in this rapidly growing field.
arXiv Detail & Related papers (2021-05-06T13:05:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.