Data-Free Knowledge Distillation with Soft Targeted Transfer Set
Synthesis
- URL: http://arxiv.org/abs/2104.04868v1
- Date: Sat, 10 Apr 2021 22:42:14 GMT
- Title: Data-Free Knowledge Distillation with Soft Targeted Transfer Set
Synthesis
- Authors: Zi Wang
- Abstract summary: Knowledge distillation (KD) has proved to be an effective approach for deep neural network compression.
In traditional KD, the transferred knowledge is usually obtained by feeding training samples to the teacher network.
The original training dataset is not always available due to storage costs or privacy issues.
We propose a novel data-free KD approach by modeling the intermediate feature space of the teacher.
- Score: 8.87104231451079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation (KD) has proved to be an effective approach for deep
neural network compression, which learns a compact network (student) by
transferring the knowledge from a pre-trained, over-parameterized network
(teacher). In traditional KD, the transferred knowledge is usually obtained by
feeding training samples to the teacher network to obtain the class
probabilities. However, the original training dataset is not always available
due to storage costs or privacy issues. In this study, we propose a novel
data-free KD approach by modeling the intermediate feature space of the teacher
with a multivariate normal distribution and leveraging the soft targeted labels
generated by the distribution to synthesize pseudo samples as the transfer set.
Several student networks trained with these synthesized transfer sets present
competitive performance compared to the networks trained with the original
training set and other data-free KD approaches.
Related papers
- Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling [81.00825302340984]
We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.
In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.
We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
arXiv Detail & Related papers (2024-10-15T06:51:25Z) - Small Scale Data-Free Knowledge Distillation [37.708282211941416]
We propose Small Scale Data-free Knowledge Distillation SSD-KD.
SSD-KD balances synthetic samples and a priority sampling function to select proper samples.
It can perform distillation training conditioned on an extremely small scale of synthetic samples.
arXiv Detail & Related papers (2024-06-12T05:09:41Z) - Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$)
We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network.
We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z) - Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data.
The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task.
We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z) - Learning to Retain while Acquiring: Combating Distribution-Shift in
Adversarial Data-Free Knowledge Distillation [31.294947552032088]
Data-free Knowledge Distillation (DFKD) has gained popularity recently, with the fundamental idea of carrying out knowledge transfer from a Teacher to a Student neural network in the absence of training data.
We propose a meta-learning inspired framework by treating the task of Knowledge-Acquisition (learning from newly generated samples) and Knowledge-Retention (retaining knowledge on previously met samples) as meta-train and meta-test.
arXiv Detail & Related papers (2023-02-28T03:50:56Z) - Parameter-Efficient and Student-Friendly Knowledge Distillation [83.56365548607863]
We present a parameter-efficient and student-friendly knowledge distillation method, namely PESF-KD, to achieve efficient and sufficient knowledge transfer.
Experiments on a variety of benchmarks show that PESF-KD can significantly reduce the training cost while obtaining competitive results compared to advanced online distillation methods.
arXiv Detail & Related papers (2022-05-28T16:11:49Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge
Distillation [28.874162427052905]
We investigate the effectiveness of "arbitrary transfer sets" such as random noise, publicly available synthetic, and natural datasets.
We find surprising effectiveness of using arbitrary data to conduct knowledge distillation when this dataset is "target-class balanced"
arXiv Detail & Related papers (2020-11-18T06:33:20Z) - Towards Accurate Quantization and Pruning via Data-free Knowledge
Transfer [61.85316480370141]
We study data-free quantization and pruning by transferring knowledge from trained large networks to compact networks.
Our data-free compact networks achieve competitive accuracy to networks trained and fine-tuned with training data.
arXiv Detail & Related papers (2020-10-14T18:02:55Z) - Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized
Deep Neural Networks [27.533162215182422]
quantization of deep neural networks (QDNNs) has been actively studied for deployment in edge devices.
Recent studies employ the knowledge distillation (KD) method to improve the performance of quantized networks.
In this study, we propose ensemble training for QDNNs (SPEQ)
arXiv Detail & Related papers (2020-09-30T08:38:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.