Related papers: Multi-teacher knowledge distillation as an effective method for compressing ensembles of neural networks

Multi-teacher knowledge distillation as an effective method for compressing ensembles of neural networks

URL: http://arxiv.org/abs/2302.07215v1
Date: Tue, 14 Feb 2023 17:40:36 GMT
Title: Multi-teacher knowledge distillation as an effective method for compressing ensembles of neural networks
Authors: Konrad Zuchniak
Abstract summary: Large-scale deep models have achieved great success, but the enormous computational complexity and gigantic storage requirements make them difficult to implement in real-time applications. We present a modified knowledge distillation framework which allows compressing the entire ensemble model into a weight space of a single model. We show that knowledge distillation can aggregate knowledge from multiple teachers in only one student model and, with the same computational complexity, obtain a better-performing model compared to a model trained in the standard manner.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning has contributed greatly to many successes in artificial intelligence in recent years. Today, it is possible to train models that have thousands of layers and hundreds of billions of parameters. Large-scale deep models have achieved great success, but the enormous computational complexity and gigantic storage requirements make it extremely difficult to implement them in real-time applications. On the other hand, the size of the dataset is still a real problem in many domains. Data are often missing, too expensive, or impossible to obtain for other reasons. Ensemble learning is partially a solution to the problem of small datasets and overfitting. However, ensemble learning in its basic version is associated with a linear increase in computational complexity. We analyzed the impact of the ensemble decision-fusion mechanism and checked various methods of sharing the decisions including voting algorithms. We used the modified knowledge distillation framework as a decision-fusion mechanism which allows in addition compressing of the entire ensemble model into a weight space of a single model. We showed that knowledge distillation can aggregate knowledge from multiple teachers in only one student model and, with the same computational complexity, obtain a better-performing model compared to a model trained in the standard manner. We have developed our own method for mimicking the responses of all teachers at the same time, simultaneously. We tested these solutions on several benchmark datasets. In the end, we presented a wide application use of the efficient multi-teacher knowledge distillation framework. In the first example, we used knowledge distillation to develop models that could automate corrosion detection on aircraft fuselage. The second example describes detection of smoke on observation cameras in order to counteract wildfires in forests.

Related papers

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
From Actions to Events: A Transfer Learning Approach Using Improved Deep Belief Networks [1.0554048699217669]
This paper proposes a novel approach to map the knowledge from action recognition to event recognition using an energy-based model. Such a model can process all frames simultaneously, carrying spatial and temporal information through the learning process.
arXiv Detail & Related papers (2022-11-30T14:47:10Z)
Multi-Scale Aligned Distillation for Low-Resolution Detection [68.96325141432078]
This paper focuses on boosting the performance of low-resolution models by distilling knowledge from a high- or multi-resolution model. On several instance-level detection tasks and datasets, the low-resolution models trained via our approach perform competitively with high-resolution models trained via conventional multi-scale training.
arXiv Detail & Related papers (2021-09-14T12:53:35Z)
Multi-Robot Deep Reinforcement Learning for Mobile Navigation [82.62621210336881]
We propose a deep reinforcement learning algorithm with hierarchically integrated models (HInt) At training time, HInt learns separate perception and dynamics models, and at test time, HInt integrates the two models in a hierarchical manner and plans actions with the integrated model. Our mobile navigation experiments show that HInt outperforms conventional hierarchical policies and single-source approaches.
arXiv Detail & Related papers (2021-06-24T19:07:40Z)
Distill on the Go: Online knowledge distillation in self-supervised learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models. We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation. Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z)
Reinforced Multi-Teacher Selection for Knowledge Distillation [54.72886763796232]
knowledge distillation is a popular method for model compression. Current methods assign a fixed weight to a teacher model in the whole distillation. Most of the existing methods allocate an equal weight to every teacher model. In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled.
arXiv Detail & Related papers (2020-12-11T08:56:39Z)
Knowledge Distillation in Deep Learning and its Applications [0.6875312133832078]
Deep learning models are relatively large, and it is hard to deploy such models on resource-limited devices. One possible solution is knowledge distillation whereby a smaller model (student model) is trained by utilizing the information from a larger model (teacher model)
arXiv Detail & Related papers (2020-07-17T14:43:52Z)
Knowledge Distillation: A Survey [87.51063304509067]
Deep neural networks have been successful in both industry and academia, especially for computer vision tasks. It is a challenge to deploy these cumbersome deep models on devices with limited resources. Knowledge distillation effectively learns a small student model from a large teacher model.
arXiv Detail & Related papers (2020-06-09T21:47:17Z)
Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner. We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z)
Auto-Ensemble: An Adaptive Learning Rate Scheduling based Deep Learning Model Ensembling [11.324407834445422]
This paper proposes Auto-Ensemble (AE) to collect checkpoints of deep learning model and ensemble them automatically. The advantage of this method is to make the model converge to various local optima by scheduling the learning rate in once training.
arXiv Detail & Related papers (2020-03-25T08:17:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.