Multi-teacher knowledge distillation as an effective method for
compressing ensembles of neural networks
- URL: http://arxiv.org/abs/2302.07215v1
- Date: Tue, 14 Feb 2023 17:40:36 GMT
- Title: Multi-teacher knowledge distillation as an effective method for
compressing ensembles of neural networks
- Authors: Konrad Zuchniak
- Abstract summary: Large-scale deep models have achieved great success, but the enormous computational complexity and gigantic storage requirements make them difficult to implement in real-time applications.
We present a modified knowledge distillation framework which allows compressing the entire ensemble model into a weight space of a single model.
We show that knowledge distillation can aggregate knowledge from multiple teachers in only one student model and, with the same computational complexity, obtain a better-performing model compared to a model trained in the standard manner.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning has contributed greatly to many successes in artificial
intelligence in recent years. Today, it is possible to train models that have
thousands of layers and hundreds of billions of parameters. Large-scale deep
models have achieved great success, but the enormous computational complexity
and gigantic storage requirements make it extremely difficult to implement them
in real-time applications. On the other hand, the size of the dataset is still
a real problem in many domains. Data are often missing, too expensive, or
impossible to obtain for other reasons. Ensemble learning is partially a
solution to the problem of small datasets and overfitting. However, ensemble
learning in its basic version is associated with a linear increase in
computational complexity. We analyzed the impact of the ensemble
decision-fusion mechanism and checked various methods of sharing the decisions
including voting algorithms. We used the modified knowledge distillation
framework as a decision-fusion mechanism which allows in addition compressing
of the entire ensemble model into a weight space of a single model. We showed
that knowledge distillation can aggregate knowledge from multiple teachers in
only one student model and, with the same computational complexity, obtain a
better-performing model compared to a model trained in the standard manner. We
have developed our own method for mimicking the responses of all teachers at
the same time, simultaneously. We tested these solutions on several benchmark
datasets. In the end, we presented a wide application use of the efficient
multi-teacher knowledge distillation framework. In the first example, we used
knowledge distillation to develop models that could automate corrosion
detection on aircraft fuselage. The second example describes detection of smoke
on observation cameras in order to counteract wildfires in forests.
Related papers
- BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - From Actions to Events: A Transfer Learning Approach Using Improved Deep
Belief Networks [1.0554048699217669]
This paper proposes a novel approach to map the knowledge from action recognition to event recognition using an energy-based model.
Such a model can process all frames simultaneously, carrying spatial and temporal information through the learning process.
arXiv Detail & Related papers (2022-11-30T14:47:10Z) - Multi-Scale Aligned Distillation for Low-Resolution Detection [68.96325141432078]
This paper focuses on boosting the performance of low-resolution models by distilling knowledge from a high- or multi-resolution model.
On several instance-level detection tasks and datasets, the low-resolution models trained via our approach perform competitively with high-resolution models trained via conventional multi-scale training.
arXiv Detail & Related papers (2021-09-14T12:53:35Z) - Multi-Robot Deep Reinforcement Learning for Mobile Navigation [82.62621210336881]
We propose a deep reinforcement learning algorithm with hierarchically integrated models (HInt)
At training time, HInt learns separate perception and dynamics models, and at test time, HInt integrates the two models in a hierarchical manner and plans actions with the integrated model.
Our mobile navigation experiments show that HInt outperforms conventional hierarchical policies and single-source approaches.
arXiv Detail & Related papers (2021-06-24T19:07:40Z) - Distill on the Go: Online knowledge distillation in self-supervised
learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models.
We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation.
Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z) - Reinforced Multi-Teacher Selection for Knowledge Distillation [54.72886763796232]
knowledge distillation is a popular method for model compression.
Current methods assign a fixed weight to a teacher model in the whole distillation.
Most of the existing methods allocate an equal weight to every teacher model.
In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled.
arXiv Detail & Related papers (2020-12-11T08:56:39Z) - Knowledge Distillation in Deep Learning and its Applications [0.6875312133832078]
Deep learning models are relatively large, and it is hard to deploy such models on resource-limited devices.
One possible solution is knowledge distillation whereby a smaller model (student model) is trained by utilizing the information from a larger model (teacher model)
arXiv Detail & Related papers (2020-07-17T14:43:52Z) - Knowledge Distillation: A Survey [87.51063304509067]
Deep neural networks have been successful in both industry and academia, especially for computer vision tasks.
It is a challenge to deploy these cumbersome deep models on devices with limited resources.
Knowledge distillation effectively learns a small student model from a large teacher model.
arXiv Detail & Related papers (2020-06-09T21:47:17Z) - Neural Networks Are More Productive Teachers Than Human Raters: Active
Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner.
We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z) - Auto-Ensemble: An Adaptive Learning Rate Scheduling based Deep Learning
Model Ensembling [11.324407834445422]
This paper proposes Auto-Ensemble (AE) to collect checkpoints of deep learning model and ensemble them automatically.
The advantage of this method is to make the model converge to various local optima by scheduling the learning rate in once training.
arXiv Detail & Related papers (2020-03-25T08:17:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.