Spirit Distillation: Precise Real-time Prediction with Insufficient Data
- URL: http://arxiv.org/abs/2103.13733v1
- Date: Thu, 25 Mar 2021 10:23:30 GMT
- Title: Spirit Distillation: Precise Real-time Prediction with Insufficient Data
- Authors: Zhiyuan Wu, Hong Qi, Yu Jiang, Chupeng Cui, Zongmin Yang, Xinhui Xue
- Abstract summary: We propose a new training framework named Spirit Distillation(SD)
It extends the ideas of fine-tuning-based transfer learning(FTT) and feature-based knowledge distillation.
Results demonstrate the boosting performance in segmentation(mIOU) and high-precision accuracy boost by 1.4% and 8.2% respectively.
- Score: 4.6247655021017655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent trend demonstrates the effectiveness of deep neural networks (DNNs)
apply on the task of environment perception in autonomous driving system. While
large-scale and complete data can train out fine DNNs, collecting it is always
difficult, expensive, and time-consuming. Also, the significance of both
accuracy and efficiency cannot be over-emphasized due to the requirement of
real-time recognition. To alleviate the conflicts between weak data and high
computational consumption of DNNs, we propose a new training framework named
Spirit Distillation(SD). It extends the ideas of fine-tuning-based transfer
learning(FTT) and feature-based knowledge distillation. By allowing the student
to mimic its teacher in feature extraction, the gap of general features between
the teacher-student networks is bridged. The Image Party distillation
enhancement method(IP) is also proposed, which shuffling images from various
domains, and randomly selecting a few as mini-batch. With this approach, the
overfitting that the student network to the general features of the teacher
network can be easily avoided. Persuasive experiments and discussions are
conducted on CityScapes with the prompt of COCO2017 and KITTI. Results
demonstrate the boosting performance in segmentation(mIOU and high-precision
accuracy boost by 1.4% and 8.2% respectively, with 78.2% output variance), and
can gain a precise compact network with only 41.8\% FLOPs(see Fig. 1). This
paper is a pioneering work on knowledge distillation applied to few-shot
learning. The proposed methods significantly reduce the dependence on data of
DNNs training, and improves the robustness of DNNs when facing rare situations,
with real-time requirement satisfied. We provide important technical support
for the advancement of scene perception technology for autonomous driving.
Related papers
- Self-Distillation Learning Based on Temporal-Spatial Consistency for Spiking Neural Networks [3.7748662901422807]
Spiking neural networks (SNNs) have attracted considerable attention for their event-driven, low-power characteristics and high biological interpretability.
Recent research has improved the performance of the SNN model with a pre-trained teacher model.
In this paper, we explore cost-effective self-distillation learning of SNNs to circumvent these concerns.
arXiv Detail & Related papers (2024-06-12T04:30:40Z) - Knowledge Distillation on Spatial-Temporal Graph Convolutional Network for Traffic Prediction [3.0450307343472405]
We introduce a cost function designed to train a network with fewer parameters (the student) using distilled data from a complex network (the teacher)
We use knowledge distillation, incorporating spatial-temporal correlations from the teacher network to enable the student to learn the complex patterns perceived by the teacher.
Our method can maintain the student's accuracy close to that of the teacher, even with the retention of only 3% of network parameters.
arXiv Detail & Related papers (2024-01-22T09:54:49Z) - Feature-domain Adaptive Contrastive Distillation for Efficient Single
Image Super-Resolution [3.2453621806729234]
CNN-based SISR has numerous parameters and high computational cost to achieve better performance.
Knowledge Distillation (KD) transfers teacher's useful knowledge to student.
We propose a feature-domain adaptive contrastive distillation (FACD) method for efficiently training lightweight student SISR networks.
arXiv Detail & Related papers (2022-11-29T06:24:14Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - Parameter-Efficient and Student-Friendly Knowledge Distillation [83.56365548607863]
We present a parameter-efficient and student-friendly knowledge distillation method, namely PESF-KD, to achieve efficient and sufficient knowledge transfer.
Experiments on a variety of benchmarks show that PESF-KD can significantly reduce the training cost while obtaining competitive results compared to advanced online distillation methods.
arXiv Detail & Related papers (2022-05-28T16:11:49Z) - lpSpikeCon: Enabling Low-Precision Spiking Neural Network Processing for
Efficient Unsupervised Continual Learning on Autonomous Agents [14.916996986290902]
We propose lpSpikeCon, a novel methodology to enable low-precision SNN processing for efficient unsupervised continual learning.
Our lpSpikeCon can reduce weight memory of the SNN model by 8x (i.e., by judiciously employing 4-bit weights) for performing online training with unsupervised continual learning.
arXiv Detail & Related papers (2022-05-24T18:08:16Z) - Towards Reducing Labeling Cost in Deep Object Detection [61.010693873330446]
We propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector.
Our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift.
arXiv Detail & Related papers (2021-06-22T16:53:09Z) - Dual Discriminator Adversarial Distillation for Data-free Model
Compression [36.49964835173507]
We propose Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data.
To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data.
The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data.
arXiv Detail & Related papers (2021-04-12T12:01:45Z) - S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural
Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution.
Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs.
Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z) - On Self-Distilling Graph Neural Network [64.00508355508106]
We propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD)
The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way.
We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies.
arXiv Detail & Related papers (2020-11-04T12:29:33Z) - Circumventing Outliers of AutoAugment with Knowledge Distillation [102.25991455094832]
AutoAugment has been a powerful algorithm that improves the accuracy of many vision tasks.
This paper delves deep into the working mechanism, and reveals that AutoAugment may remove part of discriminative information from the training image.
To relieve the inaccuracy of supervision, we make use of knowledge distillation that refers to the output of a teacher model to guide network training.
arXiv Detail & Related papers (2020-03-25T11:51:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.