Data-Free Distillation of Language Model by Text-to-Text Transfer
- URL: http://arxiv.org/abs/2311.01689v1
- Date: Fri, 3 Nov 2023 03:31:47 GMT
- Title: Data-Free Distillation of Language Model by Text-to-Text Transfer
- Authors: Zheyuan Bai, Xinduo Liu, Hailin Hu, Tianyu Guo, Qinghua Zhang, Yunhe
Wang
- Abstract summary: Data-Free Knowledge Distillation (DFKD) plays a vital role in compressing the model when original training data is unavailable.
We propose a novel DFKD framework, namely DFKD-T$3$, where the pretrained generative language model can also serve as a controllable data generator for model compression.
Our method can boost the distillation performance in various downstream tasks such as sentiment analysis, linguistic acceptability, and information extraction.
- Score: 22.830164917398623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data-Free Knowledge Distillation (DFKD) plays a vital role in compressing the
model when original training data is unavailable. Previous works for DFKD in
NLP mainly focus on distilling encoder-only structures like BERT on
classification tasks, which overlook the notable progress of generative
language modeling. In this work, we propose a novel DFKD framework, namely
DFKD-T$^{3}$, where the pretrained generative language model can also serve as
a controllable data generator for model compression. This novel framework
DFKD-T$^{3}$ leads to an end-to-end learnable text-to-text framework to
transform the general domain corpus to compression-friendly task data,
targeting to improve both the \textit{specificity} and \textit{diversity}.
Extensive experiments show that our method can boost the distillation
performance in various downstream tasks such as sentiment analysis, linguistic
acceptability, and information extraction. Furthermore, we show that the
generated texts can be directly used for distilling other language models and
outperform the SOTA methods, making our method more appealing in a general DFKD
setting. Our code is available at
https://gitee.com/mindspore/models/tree/master/research/nlp/DFKD\_T3.
Related papers
- Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource
Agglutinative Data-to-Text Generation [9.80836683456026]
We tackle data-to-text for isiXhosa, which is low-resource and agglutinative.
We introduce Triples-to-isiXhosa (T2X), a new dataset based on a subset of WebNLG.
We develop an evaluation framework for T2X that measures how accurately generated text describes the data.
arXiv Detail & Related papers (2024-03-12T11:53:27Z) - Text-to-3D with Classifier Score Distillation [80.14832887529259]
Classifier-free guidance is considered an auxiliary trick rather than the most essential.
We name this method Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation.
We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing.
arXiv Detail & Related papers (2023-10-30T10:25:40Z) - GECTurk: Grammatical Error Correction and Detection Dataset for Turkish [1.804922416527064]
Grammatical Error Detection and Correction (GEC) tools have proven useful for native speakers and second language learners.
Synthetic data generation is a common practice to overcome the scarcity of such data.
We present a flexible and synthetic data generation pipeline for Turkish covering more than 20 expert-curated grammar and spelling rules.
arXiv Detail & Related papers (2023-09-20T14:25:44Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Prompting to Distill: Boosting Data-Free Knowledge Distillation via
Reinforced Prompt [52.6946016535059]
Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data.
We propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors.
As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance.
arXiv Detail & Related papers (2022-05-16T08:56:53Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Adversarial Self-Supervised Data-Free Distillation for Text
Classification [13.817252068643066]
We propose a novel two-stage data-free distillation method, named Adversarial self-Supervised Data-Free Distillation (AS-DFD)
Our framework is the first data-free distillation framework designed for NLP tasks.
arXiv Detail & Related papers (2020-10-10T02:46:06Z) - Abstractive Text Summarization based on Language Model Conditioning and
Locality Modeling [4.525267347429154]
We train a Transformer-based neural model on the BERT language model.
In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size.
The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset.
arXiv Detail & Related papers (2020-03-29T14:00:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.