Related papers: Data-Free Distillation of Language Model by Text-to-Text Transfer

Data-Free Distillation of Language Model by Text-to-Text Transfer

URL: http://arxiv.org/abs/2311.01689v1
Date: Fri, 3 Nov 2023 03:31:47 GMT
Title: Data-Free Distillation of Language Model by Text-to-Text Transfer
Authors: Zheyuan Bai, Xinduo Liu, Hailin Hu, Tianyu Guo, Qinghua Zhang, Yunhe Wang
Abstract summary: Data-Free Knowledge Distillation (DFKD) plays a vital role in compressing the model when original training data is unavailable. We propose a novel DFKD framework, namely DFKD-T$3$, where the pretrained generative language model can also serve as a controllable data generator for model compression. Our method can boost the distillation performance in various downstream tasks such as sentiment analysis, linguistic acceptability, and information extraction.
Score: 22.830164917398623
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data-Free Knowledge Distillation (DFKD) plays a vital role in compressing the model when original training data is unavailable. Previous works for DFKD in NLP mainly focus on distilling encoder-only structures like BERT on classification tasks, which overlook the notable progress of generative language modeling. In this work, we propose a novel DFKD framework, namely DFKD-T$^{3}$, where the pretrained generative language model can also serve as a controllable data generator for model compression. This novel framework DFKD-T$^{3}$ leads to an end-to-end learnable text-to-text framework to transform the general domain corpus to compression-friendly task data, targeting to improve both the \textit{specificity} and \textit{diversity}. Extensive experiments show that our method can boost the distillation performance in various downstream tasks such as sentiment analysis, linguistic acceptability, and information extraction. Furthermore, we show that the generated texts can be directly used for distilling other language models and outperform the SOTA methods, making our method more appealing in a general DFKD setting. Our code is available at https://gitee.com/mindspore/models/tree/master/research/nlp/DFKD\_T3.

Related papers

Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation. We introduce novel methodologies and datasets to overcome these challenges. We propose MhBART, an encoder-decoder model designed to emulate human writing style. We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation [9.80836683456026]
We tackle data-to-text for isiXhosa, which is low-resource and agglutinative. We introduce Triples-to-isiXhosa (T2X), a new dataset based on a subset of WebNLG. We develop an evaluation framework for T2X that measures how accurately generated text describes the data.
arXiv Detail & Related papers (2024-03-12T11:53:27Z)
Text-to-3D with Classifier Score Distillation [80.14832887529259]
Classifier-free guidance is considered an auxiliary trick rather than the most essential. We name this method Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing.
arXiv Detail & Related papers (2023-10-30T10:25:40Z)
GECTurk: Grammatical Error Correction and Detection Dataset for Turkish [1.804922416527064]
Grammatical Error Detection and Correction (GEC) tools have proven useful for native speakers and second language learners. Synthetic data generation is a common practice to overcome the scarcity of such data. We present a flexible and synthetic data generation pipeline for Turkish covering more than 20 expert-curated grammar and spelling rules.
arXiv Detail & Related papers (2023-09-20T14:25:44Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
Prompting to Distill: Boosting Data-Free Knowledge Distillation via Reinforced Prompt [52.6946016535059]
Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data. We propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors. As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance.
arXiv Detail & Related papers (2022-05-16T08:56:53Z)
Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents. Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages. We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z)
Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
Adversarial Self-Supervised Data-Free Distillation for Text Classification [13.817252068643066]
We propose a novel two-stage data-free distillation method, named Adversarial self-Supervised Data-Free Distillation (AS-DFD) Our framework is the first data-free distillation framework designed for NLP tasks.
arXiv Detail & Related papers (2020-10-10T02:46:06Z)
Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling [4.525267347429154]
We train a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset.
arXiv Detail & Related papers (2020-03-29T14:00:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.