Extracting knowledge from features with multilevel abstraction
- URL: http://arxiv.org/abs/2112.13642v1
- Date: Sat, 4 Dec 2021 02:25:46 GMT
- Title: Extracting knowledge from features with multilevel abstraction
- Authors: Jinhong Lin, Zhaoyang Li
- Abstract summary: Self-knowledge distillation (SKD) aims at transferring the knowledge from a large teacher model to a small student model.
In this paper, we purpose a novel SKD method in a different way from the main stream methods.
Experiments and ablation studies show its great effectiveness and generalization on various kinds of tasks.
- Score: 3.4443503349903124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation aims at transferring the knowledge from a large
teacher model to a small student model with great improvements of the
performance of the student model. Therefore, the student network can replace
the teacher network to deploy on low-resource devices since the higher
performance, lower number of parameters and shorter inference time.
Self-knowledge distillation (SKD) attracts a great attention recently that a
student model itself is a teacher model distilling knowledge from. To the best
of our knowledge, self knowledge distillation can be divided into two main
streams: data augmentation and refined knowledge auxiliary. In this paper, we
purpose a novel SKD method in a different way from the main stream methods. Our
method distills knowledge from multilevel abstraction features. Experiments and
ablation studies show its great effectiveness and generalization on various
kinds of tasks with various kinds of model structures. Our codes have been
released on GitHub.
Related papers
- LAKD-Activation Mapping Distillation Based on Local Learning [12.230042188890838]
This paper proposes a novel knowledge distillation framework, Local Attention Knowledge Distillation (LAKD)
LAKD more efficiently utilizes the distilled information from teacher networks, achieving higher interpretability and competitive performance.
We conducted experiments on the CIFAR-10, CIFAR-100, and ImageNet datasets, and the results show that our LAKD method significantly outperforms existing methods.
arXiv Detail & Related papers (2024-08-21T09:43:27Z) - ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation [3.301728339780329]
We propose an innovative method to boost Knowledge Distillation efficiency without the need for resource-heavy teacher models.
In our work, we propose an efficient method for generating soft labels, thereby eliminating the need for a large teacher model.
Our experiments on various datasets, including CIFAR-100, Tiny Imagenet, and Fashion MNIST, demonstrate the superior resource efficiency of our approach.
arXiv Detail & Related papers (2024-04-15T15:54:30Z) - Cooperative Knowledge Distillation: A Learner Agnostic Approach [15.414204257189596]
We formulate a novel form of knowledge distillation in which many models can act as both students and teachers.
Because different models may have different strengths and weaknesses, all models can act as either students or teachers.
arXiv Detail & Related papers (2024-02-02T17:31:50Z) - AD-KD: Attribution-Driven Knowledge Distillation for Language Model
Compression [26.474962405945316]
We present a novel attribution-driven knowledge distillation approach to compress pre-trained language models.
To enhance the knowledge transfer of model reasoning and generalization, we explore multi-view attribution distillation on all potential decisions of the teacher.
arXiv Detail & Related papers (2023-05-17T07:40:12Z) - Distilling Knowledge from Self-Supervised Teacher by Embedding Graph
Alignment [52.704331909850026]
We formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network.
Inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space.
Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks.
arXiv Detail & Related papers (2022-11-23T19:27:48Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Revisiting Knowledge Distillation: An Inheritance and Exploration
Framework [153.73692961660964]
Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model to a student model.
We propose a novel inheritance and exploration knowledge distillation framework (IE-KD)
Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks.
arXiv Detail & Related papers (2021-07-01T02:20:56Z) - Collaborative Teacher-Student Learning via Multiple Knowledge Transfer [79.45526596053728]
We propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT)
It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way.
The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
arXiv Detail & Related papers (2021-01-21T07:17:04Z) - Reinforced Multi-Teacher Selection for Knowledge Distillation [54.72886763796232]
knowledge distillation is a popular method for model compression.
Current methods assign a fixed weight to a teacher model in the whole distillation.
Most of the existing methods allocate an equal weight to every teacher model.
In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled.
arXiv Detail & Related papers (2020-12-11T08:56:39Z) - A Selective Survey on Versatile Knowledge Distillation Paradigm for
Neural Network Models [3.770437296936382]
We review the characteristics of knowledge distillation from the hypothesis that the three important ingredients of knowledge distillation are distilled knowledge and loss, teacher-student paradigm, and the distillation process.
We present some future works in knowledge distillation including explainable knowledge distillation where the analytical analysis of the performance gain is studied and the self-supervised learning which is a hot research topic in deep learning community.
arXiv Detail & Related papers (2020-11-30T05:22:02Z) - Knowledge Distillation: A Survey [87.51063304509067]
Deep neural networks have been successful in both industry and academia, especially for computer vision tasks.
It is a challenge to deploy these cumbersome deep models on devices with limited resources.
Knowledge distillation effectively learns a small student model from a large teacher model.
arXiv Detail & Related papers (2020-06-09T21:47:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.