Knowledge Distillation: A Survey
- URL: http://arxiv.org/abs/2006.05525v7
- Date: Thu, 20 May 2021 13:45:55 GMT
- Title: Knowledge Distillation: A Survey
- Authors: Jianping Gou, Baosheng Yu, Stephen John Maybank, Dacheng Tao
- Abstract summary: Deep neural networks have been successful in both industry and academia, especially for computer vision tasks.
It is a challenge to deploy these cumbersome deep models on devices with limited resources.
Knowledge distillation effectively learns a small student model from a large teacher model.
- Score: 87.51063304509067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, deep neural networks have been successful in both industry
and academia, especially for computer vision tasks. The great success of deep
learning is mainly due to its scalability to encode large-scale data and to
maneuver billions of model parameters. However, it is a challenge to deploy
these cumbersome deep models on devices with limited resources, e.g., mobile
phones and embedded devices, not only because of the high computational
complexity but also the large storage requirements. To this end, a variety of
model compression and acceleration techniques have been developed. As a
representative type of model compression and acceleration, knowledge
distillation effectively learns a small student model from a large teacher
model. It has received rapid increasing attention from the community. This
paper provides a comprehensive survey of knowledge distillation from the
perspectives of knowledge categories, training schemes, teacher-student
architecture, distillation algorithms, performance comparison and applications.
Furthermore, challenges in knowledge distillation are briefly reviewed and
comments on future research are discussed and forwarded.
Related papers
- Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models [0.0]
An increase in the number of connected devices around the world warrants compressed models that can be easily deployed at the local devices with low compute capacity and power accessibility.
We implemented both, quantization and pruning, compression techniques on popular deep learning models used in the image classification, object detection, language models and generative models-based problem statements.
arXiv Detail & Related papers (2024-07-22T14:20:53Z) - A Comprehensive Review of Knowledge Distillation in Computer Vision [4.9407806800208816]
This review paper examines the current state of research on knowledge distillation, a technique for compressing complex models into smaller and simpler ones.
The paper provides an overview of the major principles and techniques associated with knowledge distillation and reviews the applications of knowledge distillation in the domain of computer vision.
arXiv Detail & Related papers (2024-04-01T05:46:15Z) - Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks.
Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios.
New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z) - Multi-teacher knowledge distillation as an effective method for
compressing ensembles of neural networks [0.0]
Large-scale deep models have achieved great success, but the enormous computational complexity and gigantic storage requirements make them difficult to implement in real-time applications.
We present a modified knowledge distillation framework which allows compressing the entire ensemble model into a weight space of a single model.
We show that knowledge distillation can aggregate knowledge from multiple teachers in only one student model and, with the same computational complexity, obtain a better-performing model compared to a model trained in the standard manner.
arXiv Detail & Related papers (2023-02-14T17:40:36Z) - From Actions to Events: A Transfer Learning Approach Using Improved Deep
Belief Networks [1.0554048699217669]
This paper proposes a novel approach to map the knowledge from action recognition to event recognition using an energy-based model.
Such a model can process all frames simultaneously, carrying spatial and temporal information through the learning process.
arXiv Detail & Related papers (2022-11-30T14:47:10Z) - Learnware: Small Models Do Big [69.88234743773113]
The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions.
This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes.
arXiv Detail & Related papers (2022-10-07T15:55:52Z) - Design Automation for Fast, Lightweight, and Effective Deep Learning
Models: A Survey [53.258091735278875]
This survey covers studies of design automation techniques for deep learning models targeting edge computing.
It offers an overview and comparison of key metrics that are used commonly to quantify the proficiency of models in terms of effectiveness, lightness, and computational costs.
The survey proceeds to cover three categories of the state-of-the-art of deep model design automation techniques.
arXiv Detail & Related papers (2022-08-22T12:12:43Z) - Neural Architecture Search for Dense Prediction Tasks in Computer Vision [74.9839082859151]
Deep learning has led to a rising demand for neural network architecture engineering.
neural architecture search (NAS) aims at automatically designing neural network architectures in a data-driven manner rather than manually.
NAS has become applicable to a much wider range of problems in computer vision.
arXiv Detail & Related papers (2022-02-15T08:06:50Z) - Knowledge Distillation in Deep Learning and its Applications [0.6875312133832078]
Deep learning models are relatively large, and it is hard to deploy such models on resource-limited devices.
One possible solution is knowledge distillation whereby a smaller model (student model) is trained by utilizing the information from a larger model (teacher model)
arXiv Detail & Related papers (2020-07-17T14:43:52Z) - Neural Networks Are More Productive Teachers Than Human Raters: Active
Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner.
We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.