Directed Acyclic Transformer Pre-training for High-quality
Non-autoregressive Text Generation
- URL: http://arxiv.org/abs/2304.11791v1
- Date: Mon, 24 Apr 2023 02:30:33 GMT
- Title: Directed Acyclic Transformer Pre-training for High-quality
Non-autoregressive Text Generation
- Authors: Fei Huang, Pei Ke, Minlie Huang
- Abstract summary: Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation.
Existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models.
We propose Pre-trained Directed Acyclic Transformer to promote prediction consistency in NAR generation.
- Score: 98.37871690400766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-AutoRegressive (NAR) text generation models have drawn much attention
because of their significantly faster decoding speed and good generation
quality in machine translation. However, in a wider range of text generation
tasks, existing NAR models lack proper pre-training, making them still far
behind the pre-trained autoregressive models. In this paper, we propose
Pre-trained Directed Acyclic Transformer (PreDAT) and a novel pre-training task
to promote prediction consistency in NAR generation. Experiments on five text
generation tasks show that our PreDAT remarkably outperforms existing
pre-trained NAR models (+4.2 scores on average) and even achieves better
results than pre-trained autoregressive baselines in n-gram-based metrics,
along with 17 times speedup in throughput. Further analysis shows that PreDAT
benefits from the unbiased prediction order that alleviates the error
accumulation problem in autoregressive generation, which provides new insights
into the advantages of NAR generation.
Related papers
- UT5: Pretraining Non autoregressive T5 with unrolled denoising [9.656399724144192]
We studied unsupervised pretraining for non auto-regressive T5 models via unrolled denoising.
We showed its SoTA results in downstream generation tasks such as SQuAD question generation and XSum.
arXiv Detail & Related papers (2023-11-14T21:28:10Z) - Dynamic Scheduled Sampling with Imitation Loss for Neural Text
Generation [10.306522595622651]
We introduce Dynamic Scheduled Sampling with Imitation Loss (DySI), which maintains the schedule based solely on the training time accuracy.
DySI achieves notable improvements on standard machine translation benchmarks, and significantly improves the robustness of other text generation models.
arXiv Detail & Related papers (2023-01-31T16:41:06Z) - A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation [135.84684279852098]
Non-Autoregressive (NAR) models significantly under-perform Auto-regressive (AR) models on various language generation tasks.
Among the NAR models, BANG is the first large-scale pre-training model on English un-labeled raw text corpus.
We propose a novel self-paced mixed distillation method to further improve the generation quality of BANG.
arXiv Detail & Related papers (2022-05-23T09:54:53Z) - A Survey on Non-Autoregressive Generation for Neural Machine Translation
and Beyond [145.43029264191543]
Non-autoregressive (NAR) generation is first proposed in machine translation (NMT) to speed up inference.
While NAR generation can significantly accelerate machine translation, the inference of autoregressive (AR) generation sacrificed translation accuracy.
Many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation.
arXiv Detail & Related papers (2022-04-20T07:25:22Z) - Improving Non-autoregressive Generation with Mixup Training [51.61038444990301]
We present a non-autoregressive generation model based on pre-trained transformer models.
We propose a simple and effective iterative training method called MIx Source and pseudo Target.
Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results.
arXiv Detail & Related papers (2021-10-21T13:04:21Z) - A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text
Generation [59.64193903397301]
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.
We conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR)
The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances.
arXiv Detail & Related papers (2021-10-11T13:05:06Z) - Non-Autoregressive Text Generation with Pre-trained Language Models [40.50508206201288]
We show that BERT can be employed as the backbone of a NAG model to greatly improve performance.
We devise mechanisms to alleviate the two common problems of vanilla NAG models.
We propose a new decoding strategy, ratio-first, for applications where the output lengths can be approximately estimated beforehand.
arXiv Detail & Related papers (2021-02-16T15:30:33Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.