Adversarial Self-Supervised Data-Free Distillation for Text
Classification
- URL: http://arxiv.org/abs/2010.04883v1
- Date: Sat, 10 Oct 2020 02:46:06 GMT
- Title: Adversarial Self-Supervised Data-Free Distillation for Text
Classification
- Authors: Xinyin Ma, Yongliang Shen, Gongfan Fang, Chen Chen, Chenghao Jia,
Weiming Lu
- Abstract summary: We propose a novel two-stage data-free distillation method, named Adversarial self-Supervised Data-Free Distillation (AS-DFD)
Our framework is the first data-free distillation framework designed for NLP tasks.
- Score: 13.817252068643066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained transformer-based language models have achieved impressive
results on a wide range of NLP tasks. In the past few years, Knowledge
Distillation(KD) has become a popular paradigm to compress a computationally
expensive model to a resource-efficient lightweight model. However, most KD
algorithms, especially in NLP, rely on the accessibility of the original
training dataset, which may be unavailable due to privacy issues. To tackle
this problem, we propose a novel two-stage data-free distillation method, named
Adversarial self-Supervised Data-Free Distillation (AS-DFD), which is designed
for compressing large-scale transformer-based models (e.g., BERT). To avoid
text generation in discrete space, we introduce a Plug & Play Embedding
Guessing method to craft pseudo embeddings from the teacher's hidden knowledge.
Meanwhile, with a self-supervised module to quantify the student's ability, we
adapt the difficulty of pseudo embeddings in an adversarial training manner. To
the best of our knowledge, our framework is the first data-free distillation
framework designed for NLP tasks. We verify the effectiveness of our method on
several text classification datasets.
Related papers
- Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models [25.91643745340183]
Large Language Models (LLMs) have demonstrated strong reasoning and memorization capabilities via pretraining on massive textual corpora.
This poses risk of privacy and copyright violations, highlighting the need for efficient machine unlearning methods.
We propose two novel techniques for robust and efficient unlearning for LLMs.
arXiv Detail & Related papers (2024-08-13T04:18:32Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Lightweight Self-Knowledge Distillation with Multi-source Information
Fusion [3.107478665474057]
Knowledge Distillation (KD) is a powerful technique for transferring knowledge between neural network models.
We propose a lightweight SKD framework that utilizes multi-source information to construct a more informative teacher.
We validate the performance of the proposed DRG, DSR, and their combination through comprehensive experiments on various datasets and models.
arXiv Detail & Related papers (2023-05-16T05:46:31Z) - Prompting to Distill: Boosting Data-Free Knowledge Distillation via
Reinforced Prompt [52.6946016535059]
Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data.
We propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors.
As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance.
arXiv Detail & Related papers (2022-05-16T08:56:53Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation [9.91548921801095]
We present, MATE-KD, a novel text-based adversarial training algorithm which improves the performance of knowledge distillation.
We evaluate our algorithm, using BERT-based models, on the GLUE benchmark and demonstrate that MATE-KD outperforms competitive adversarial learning and data augmentation baselines.
arXiv Detail & Related papers (2021-05-12T19:11:34Z) - Dual Discriminator Adversarial Distillation for Data-free Model
Compression [36.49964835173507]
We propose Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data.
To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data.
The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data.
arXiv Detail & Related papers (2021-04-12T12:01:45Z) - Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation [55.34995029082051]
We propose a method to learn to augment for data-scarce domain BERT knowledge distillation.
We show that the proposed method significantly outperforms state-of-the-art baselines on four different tasks.
arXiv Detail & Related papers (2021-01-20T13:07:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.