Prompting to Distill: Boosting Data-Free Knowledge Distillation via
Reinforced Prompt
- URL: http://arxiv.org/abs/2205.07523v1
- Date: Mon, 16 May 2022 08:56:53 GMT
- Title: Prompting to Distill: Boosting Data-Free Knowledge Distillation via
Reinforced Prompt
- Authors: Xinyin Ma, Xinchao Wang, Gongfan Fang, Yongliang Shen and Weiming Lu
- Abstract summary: Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data.
We propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors.
As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance.
- Score: 52.6946016535059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data-free knowledge distillation (DFKD) conducts knowledge distillation via
eliminating the dependence of original training data, and has recently achieved
impressive results in accelerating pre-trained language models. At the heart of
DFKD is to reconstruct a synthetic dataset by inverting the parameters of the
uncompressed model. Prior DFKD approaches, however, have largely relied on
hand-crafted priors of the target data distribution for the reconstruction,
which can be inevitably biased and often incompetent to capture the intrinsic
distributions. To address this problem, we propose a prompt-based method,
termed as PromptDFD, that allows us to take advantage of learned language
priors, which effectively harmonizes the synthetic sentences to be semantically
and grammatically correct. Specifically, PromptDFD leverages a pre-trained
generative model to provide language priors and introduces a reinforced topic
prompter to control data synthesis, making the generated samples thematically
relevant and semantically plausible, and thus friendly to downstream tasks. As
shown in our experiments, the proposed method substantially improves the
synthesis quality and achieves considerable improvements on distillation
performance. In some cases, PromptDFD even gives rise to results on par with
those from the data-driven knowledge distillation with access to the original
training data.
Related papers
- Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation [20.556083321381514]
Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain of model compression.
This paper introduces an innovative approach to DFKD through diverse diffusion augmentation (DDA)
Comprehensive experiments conducted on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets showcase the superior performance of our method.
arXiv Detail & Related papers (2024-10-23T07:01:16Z) - Towards a Theoretical Understanding of Memorization in Diffusion Models [76.85077961718875]
Diffusion probabilistic models (DPMs) are being employed as mainstream models for Generative Artificial Intelligence (GenAI)
We provide a theoretical understanding of memorization in both conditional and unconditional DPMs under the assumption of model convergence.
We propose a novel data extraction method named textbfSurrogate condItional Data Extraction (SIDE) that leverages a time-dependent classifier trained on the generated data as a surrogate condition to extract training data from unconditional DPMs.
arXiv Detail & Related papers (2024-10-03T13:17:06Z) - Small Scale Data-Free Knowledge Distillation [37.708282211941416]
We propose Small Scale Data-free Knowledge Distillation SSD-KD.
SSD-KD balances synthetic samples and a priority sampling function to select proper samples.
It can perform distillation training conditioned on an extremely small scale of synthetic samples.
arXiv Detail & Related papers (2024-06-12T05:09:41Z) - De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts [32.1016787150064]
Data-Free Knowledge Distillation (DFKD) is a promising task to train high-performance small models to enhance actual deployment without relying on the original training data.
Existing methods commonly avoid relying on private data by utilizing synthetic or sampled data.
This paper proposes a novel perspective with causal inference to disentangle the student models from the impact of such shifts.
arXiv Detail & Related papers (2024-03-28T16:13:22Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - On-the-fly Denoising for Data Augmentation in Natural Language
Understanding [101.46848743193358]
We propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data.
Our method can be applied to general augmentation techniques and consistently improve the performance on both text classification and question-answering tasks.
arXiv Detail & Related papers (2022-12-20T18:58:33Z) - Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay [5.3330804968579795]
Data-Free Knowledge Distillation (KD) allows knowledge transfer from a trained neural network (teacher) to a more compact one (student) in the absence of original training data.
Existing works use a validation set to monitor the accuracy of the student over real data and report the highest performance throughout the entire process.
However, validation data may not be available at distillation time either, making it infeasible to record the student snapshot that achieved the peak accuracy.
This is challenging because the student experiences knowledge degradation due to the distribution shift of the synthetic data.
We propose to model the distribution of the previously observed synthetic samples
arXiv Detail & Related papers (2022-01-09T14:14:28Z) - Up to 100x Faster Data-free Knowledge Distillation [52.666615987503995]
We introduce FastDFKD, which allows us to accelerate DFKD by a factor of orders of magnitude.
Unlike prior methods that optimize a set of data independently, we propose to learn a meta-synthesizer that seeks common features.
FastDFKD achieves data synthesis within only a few steps, significantly enhancing the efficiency of data-free training.
arXiv Detail & Related papers (2021-12-12T14:56:58Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Adversarial Self-Supervised Data-Free Distillation for Text
Classification [13.817252068643066]
We propose a novel two-stage data-free distillation method, named Adversarial self-Supervised Data-Free Distillation (AS-DFD)
Our framework is the first data-free distillation framework designed for NLP tasks.
arXiv Detail & Related papers (2020-10-10T02:46:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.