Distilling Knowledge via Knowledge Review
- URL: http://arxiv.org/abs/2104.09044v1
- Date: Mon, 19 Apr 2021 04:36:24 GMT
- Title: Distilling Knowledge via Knowledge Review
- Authors: Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia
- Abstract summary: We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
- Score: 69.15050871776552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation transfers knowledge from the teacher network to the
student one, with the goal of greatly improving the performance of the student
network. Previous methods mostly focus on proposing feature transformation and
loss functions between the same level's features to improve the effectiveness.
We differently study the factor of connection path cross levels between teacher
and student networks, and reveal its great importance. For the first time in
knowledge distillation, cross-stage connection paths are proposed. Our new
review mechanism is effective and structurally simple. Our finally designed
nested and compact framework requires negligible computation overhead, and
outperforms other methods on a variety of tasks. We apply our method to
classification, object detection, and instance segmentation tasks. All of them
witness significant student network performance improvement. Code is available
at https://github.com/Jia-Research-Lab/ReviewKD
Related papers
- Review helps learn better: Temporal Supervised Knowledge Distillation [9.220654594406508]
We find that during the network training, the evolution of feature map follows temporal sequence property.
Inspired by this observation, we propose Temporal Supervised Knowledge Distillation Review (TSKD)
arXiv Detail & Related papers (2023-07-03T07:51:08Z) - On effects of Knowledge Distillation on Transfer Learning [0.0]
We propose a machine learning architecture we call TL+KD that combines knowledge distillation with transfer learning.
We show that using guidance and knowledge from a larger teacher network during fine-tuning, we can improve the student network to achieve better validation performances like accuracy.
arXiv Detail & Related papers (2022-10-18T08:11:52Z) - Learning Knowledge Representation with Meta Knowledge Distillation for
Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task.
Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z) - Students are the Best Teacher: Exit-Ensemble Distillation with
Multi-Exits [25.140055086630838]
This paper proposes a novel knowledge distillation-based learning method to improve the classification performance of convolutional neural networks (CNNs)
Unlike the conventional notion of distillation where teachers only teach students, we show that students can also help other students and even the teacher to learn better.
arXiv Detail & Related papers (2021-04-01T07:10:36Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural
Architecture Search [60.965024145243596]
One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance.
To alleviate this problem, we present a simple yet effective architecture distillation method.
We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training.
Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop.
arXiv Detail & Related papers (2020-10-29T17:55:05Z) - Point Adversarial Self Mining: A Simple Method for Facial Expression
Recognition [79.75964372862279]
We propose Point Adversarial Self Mining (PASM) to improve the recognition accuracy in facial expression recognition.
PASM uses a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task.
The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively.
arXiv Detail & Related papers (2020-08-26T06:39:24Z) - Interactive Knowledge Distillation [79.12866404907506]
We propose an InterActive Knowledge Distillation scheme to leverage the interactive teaching strategy for efficient knowledge distillation.
In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation.
Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods.
arXiv Detail & Related papers (2020-07-03T03:22:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.