Related papers: A Selective Survey on Versatile Knowledge Distillation Paradigm for Neural Network Models

A Selective Survey on Versatile Knowledge Distillation Paradigm for Neural Network Models

URL: http://arxiv.org/abs/2011.14554v1
Date: Mon, 30 Nov 2020 05:22:02 GMT
Title: A Selective Survey on Versatile Knowledge Distillation Paradigm for Neural Network Models
Authors: Jeong-Hoe Ku, JiHun Oh, YoungYoon Lee, Gaurav Pooniwala, SangJeong Lee
Abstract summary: We review the characteristics of knowledge distillation from the hypothesis that the three important ingredients of knowledge distillation are distilled knowledge and loss, teacher-student paradigm, and the distillation process. We present some future works in knowledge distillation including explainable knowledge distillation where the analytical analysis of the performance gain is studied and the self-supervised learning which is a hot research topic in deep learning community.
Score: 3.770437296936382
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper aims to provide a selective survey about knowledge distillation(KD) framework for researchers and practitioners to take advantage of it for developing new optimized models in the deep neural network field. To this end, we give a brief overview of knowledge distillation and some related works including learning using privileged information(LUPI) and generalized distillation(GD). Even though knowledge distillation based on the teacher-student architecture was initially devised as a model compression technique, it has found versatile applications over various frameworks. In this paper, we review the characteristics of knowledge distillation from the hypothesis that the three important ingredients of knowledge distillation are distilled knowledge and loss,teacher-student paradigm, and the distillation process. In addition, we survey the versatility of the knowledge distillation by studying its direct applications and its usage in combination with other deep learning paradigms. Finally we present some future works in knowledge distillation including explainable knowledge distillation where the analytical analysis of the performance gain is studied and the self-supervised learning which is a hot research topic in deep learning community.

Related papers

Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection [47.0507287491627]
We propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection. By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model. Our method performs effectively during the KD process without requiring intricate structures or extensive computational resources.
arXiv Detail & Related papers (2024-06-11T06:51:02Z)
Knowledge Distillation via Token-level Relationship Graph [12.356770685214498]
We propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG) By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model. We conduct experiments to evaluate the effectiveness of the proposed method against several state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-20T08:16:37Z)
AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression [26.474962405945316]
We present a novel attribution-driven knowledge distillation approach to compress pre-trained language models. To enhance the knowledge transfer of model reasoning and generalization, we explore multi-view attribution distillation on all potential decisions of the teacher.
arXiv Detail & Related papers (2023-05-17T07:40:12Z)
A Survey on Recent Teacher-student Learning Studies [0.0]
Knowledge distillation is a method of transferring the knowledge from a complex deep neural network (DNN) to a smaller and faster DNN. Recent variants of knowledge distillation include teaching assistant distillation, curriculum distillation, mask distillation, and decoupling distillation.
arXiv Detail & Related papers (2023-04-10T14:30:28Z)
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection [35.515135913846386]
We propose an actor-critic approach to selecting appropriate knowledge to transfer during the process of knowledge distillation. Experimental results on the GLUE datasets show that our method outperforms several strong knowledge distillation baselines significantly.
arXiv Detail & Related papers (2023-02-01T13:40:19Z)
Knowledge-augmented Deep Learning and Its Applications: A Survey [60.221292040710885]
knowledge-augmented deep learning (KADL) aims to identify domain knowledge and integrate it into deep models for data-efficient, generalizable, and interpretable deep learning. This survey subsumes existing works and offers a bird's-eye view of research in the general area of knowledge-augmented deep learning.
arXiv Detail & Related papers (2022-11-30T03:44:15Z)
Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student. At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL) Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z)
A Closer Look at Knowledge Distillation with Features, Logits, and Gradients [81.39206923719455]
Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another. This work provides a new perspective to motivate a set of knowledge distillation strategies by approximating the classical KL-divergence criteria with different knowledge sources. Our analysis indicates that logits are generally a more efficient knowledge source and suggests that having sufficient feature dimensions is crucial for the model design.
arXiv Detail & Related papers (2022-03-18T21:26:55Z)
Extracting knowledge from features with multilevel abstraction [3.4443503349903124]
Self-knowledge distillation (SKD) aims at transferring the knowledge from a large teacher model to a small student model. In this paper, we purpose a novel SKD method in a different way from the main stream methods. Experiments and ablation studies show its great effectiveness and generalization on various kinds of tasks.
arXiv Detail & Related papers (2021-12-04T02:25:46Z)
Collaborative Teacher-Student Learning via Multiple Knowledge Transfer [79.45526596053728]
We propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT) It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way. The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
arXiv Detail & Related papers (2021-01-21T07:17:04Z)
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning [93.18238573921629]
We study how Ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model. We show that ensemble/knowledge distillation in deep learning works very differently from traditional learning theory. We prove that self-distillation can also be viewed as implicitly combining ensemble and knowledge distillation to improve test accuracy.
arXiv Detail & Related papers (2020-12-17T18:34:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.