Spirit Distillation: A Model Compression Method with Multi-domain
Knowledge Transfer
- URL: http://arxiv.org/abs/2104.14696v1
- Date: Thu, 29 Apr 2021 23:19:51 GMT
- Title: Spirit Distillation: A Model Compression Method with Multi-domain
Knowledge Transfer
- Authors: Zhiyuan Wu, Yu Jiang, Minghao Zhao, Chupeng Cui, Zongmin Yang, Xinhui
Xue, Hong Qi
- Abstract summary: We propose a new knowledge distillation model, named Spirit Distillation (SD), which is a model compression method with multi-domain knowledge transfer.
Results demonstrate that our method can boost mIOU and high-precision accuracy by 1.4% and 8.2% respectively with 78.2% segmentation variance.
- Score: 5.0919090307185035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent applications pose requirements of both cross-domain knowledge transfer
and model compression to machine learning models due to insufficient training
data and limited computational resources. In this paper, we propose a new
knowledge distillation model, named Spirit Distillation (SD), which is a model
compression method with multi-domain knowledge transfer. The compact student
network mimics out a representation equivalent to the front part of the teacher
network, through which the general knowledge can be transferred from the source
domain (teacher) to the target domain (student). To further improve the
robustness of the student, we extend SD to Enhanced Spirit Distillation (ESD)
in exploiting a more comprehensive knowledge by introducing the proximity
domain which is similar to the target domain for feature extraction. Results
demonstrate that our method can boost mIOU and high-precision accuracy by 1.4%
and 8.2% respectively with 78.2% segmentation variance, and can gain a precise
compact network with only 41.8% FLOPs.
Related papers
- Direct Distillation between Different Domains [97.39470334253163]
We propose a new one-stage method dubbed Direct Distillation between Different Domains" (4Ds)
We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge.
We then build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network.
arXiv Detail & Related papers (2024-01-12T02:48:51Z) - Online Knowledge Distillation for Efficient Pose Estimation [37.81478634850458]
We investigate a novel Online Knowledge Distillation framework by distilling Human Pose structure knowledge in a one-stage manner.
OKDHP trains a single multi-branch network and acquires the predicted heatmaps from each.
The pixel-wise Kullback-Leibler divergence is utilized to minimize the discrepancy between the target heatmaps and the predicted ones.
arXiv Detail & Related papers (2021-08-04T14:49:44Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - Spirit Distillation: Precise Real-time Prediction with Insufficient Data [4.6247655021017655]
We propose a new training framework named Spirit Distillation(SD)
It extends the ideas of fine-tuning-based transfer learning(FTT) and feature-based knowledge distillation.
Results demonstrate the boosting performance in segmentation(mIOU) and high-precision accuracy boost by 1.4% and 8.2% respectively.
arXiv Detail & Related papers (2021-03-25T10:23:30Z) - Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge
Distillation [12.097302014936655]
This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD)
Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation.
We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets.
arXiv Detail & Related papers (2021-03-15T10:59:43Z) - Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation [55.34995029082051]
We propose a method to learn to augment for data-scarce domain BERT knowledge distillation.
We show that the proposed method significantly outperforms state-of-the-art baselines on four different tasks.
arXiv Detail & Related papers (2021-01-20T13:07:39Z) - Dual-Teacher++: Exploiting Intra-domain and Inter-domain Knowledge with
Reliable Transfer for Cardiac Segmentation [69.09432302497116]
We propose a cutting-edge semi-supervised domain adaptation framework, namely Dual-Teacher++.
We design novel dual teacher models, including an inter-domain teacher model to explore cross-modality priors from source domain (e.g., MR) and an intra-domain teacher model to investigate the knowledge beneath unlabeled target domain.
In this way, the student model can obtain reliable dual-domain knowledge and yield improved performance on target domain data.
arXiv Detail & Related papers (2021-01-07T05:17:38Z) - Wasserstein Contrastive Representation Distillation [114.24609306495456]
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for knowledge distillation.
The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks.
Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.
arXiv Detail & Related papers (2020-12-15T23:43:28Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Extracurricular Learning: Knowledge Transfer Beyond Empirical
Distribution [17.996541285382463]
We propose extracurricular learning to bridge the gap between a compressed student model and its teacher.
We conduct rigorous evaluations on regression and classification tasks and show that compared to the standard knowledge distillation, extracurricular learning reduces the gap by 46% to 68%.
This leads to major accuracy improvements compared to the empirical risk minimization-based training for various recent neural network architectures.
arXiv Detail & Related papers (2020-06-30T18:21:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.