UT5: Pretraining Non autoregressive T5 with unrolled denoising
- URL: http://arxiv.org/abs/2311.08552v1
- Date: Tue, 14 Nov 2023 21:28:10 GMT
- Title: UT5: Pretraining Non autoregressive T5 with unrolled denoising
- Authors: Mahmoud G. Salem, Jiayu Ye, Chu-Cheng Lin, Frederick Liu
- Abstract summary: We studied unsupervised pretraining for non auto-regressive T5 models via unrolled denoising.
We showed its SoTA results in downstream generation tasks such as SQuAD question generation and XSum.
- Score: 9.656399724144192
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in Transformer-based Large Language Models have made great
strides in natural language generation. However, to decode K tokens, an
autoregressive model needs K sequential forward passes, which may be a
performance bottleneck for large language models. Many non-autoregressive (NAR)
research are aiming to address this sequentiality bottleneck, albeit many have
focused on a dedicated architecture in supervised benchmarks. In this work, we
studied unsupervised pretraining for non auto-regressive T5 models via unrolled
denoising and shown its SoTA results in downstream generation tasks such as
SQuAD question generation and XSum.
Related papers
- Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - A Text-to-Text Model for Multilingual Offensive Language Identification [19.23565690468299]
This study presents the first pre-trained model with encoder-decoder architecture for offensive language identification with text-to-text transformers (T5)
Our pre-trained T5 model outperforms other transformer-based models fine-tuned for offensive language detection, such as fBERT and HateBERT, in multiple English benchmarks.
Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5.
arXiv Detail & Related papers (2023-12-06T09:37:27Z) - Directed Acyclic Transformer Pre-training for High-quality
Non-autoregressive Text Generation [98.37871690400766]
Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation.
Existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models.
We propose Pre-trained Directed Acyclic Transformer to promote prediction consistency in NAR generation.
arXiv Detail & Related papers (2023-04-24T02:30:33Z) - EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start [21.4394742421462]
EdiT5 is a novel semi-autoregressive text-editing approach.
It combines the strengths of non-autoregressive text-editing and autoregressive decoding.
arXiv Detail & Related papers (2022-05-24T17:13:22Z) - Improving Non-autoregressive Generation with Mixup Training [51.61038444990301]
We present a non-autoregressive generation model based on pre-trained transformer models.
We propose a simple and effective iterative training method called MIx Source and pseudo Target.
Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results.
arXiv Detail & Related papers (2021-10-21T13:04:21Z) - The Power of Prompt Tuning for Low-Resource Semantic Parsing [10.37371743879877]
We investigate prompt tuning for semantic parsing.
For large T5 models we find (i.e. that prompt tuning significantly outperforms fine-tuning in the low data regime)
This last result is surprising as it suggests that large T5 models can be modulated to generate sequences far from the pre-training distribution.
arXiv Detail & Related papers (2021-10-16T09:33:09Z) - EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks [9.141586109808895]
We study fine-tuning pre-trained encoder-decoder models such as T5.
Our experimental results show that textbfEncT5 with less than half of the parameters of T5 performs similarly to T5 models on GLUE benchmark.
arXiv Detail & Related papers (2021-10-16T00:50:08Z) - Non-Autoregressive Translation by Learning Target Categorical Codes [59.840510037250944]
We propose CNAT, which learns implicitly categorical codes as latent variables into the non-autoregressive decoding.
Experiment results show that our model achieves comparable or better performance in machine translation tasks.
arXiv Detail & Related papers (2021-03-21T14:12:34Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Aligned Cross Entropy for Non-Autoregressive Machine Translation [120.15069387374717]
We propose aligned cross entropy (AXE) as an alternative loss function for training of non-autoregressive models.
AXE-based training of conditional masked language models (CMLMs) substantially improves performance on major WMT benchmarks.
arXiv Detail & Related papers (2020-04-03T16:24:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.