Related papers: EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

URL: http://arxiv.org/abs/2205.12209v1
Date: Tue, 24 May 2022 17:13:22 GMT
Title: EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start
Authors: Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn
Abstract summary: EdiT5 is a novel semi-autoregressive text-editing approach. It combines the strengths of non-autoregressive text-editing and autoregressive decoding.
Score: 21.4394742421462
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present EdiT5 - a novel semi-autoregressive text-editing approach designed to combine the strengths of non-autoregressive text-editing and autoregressive decoding. EdiT5 is faster at inference times than conventional sequence-to-sequence (seq2seq) models, while being capable of modeling flexible input-output transformations. This is achieved by decomposing the generation process into three sub-tasks: (1) tagging to decide on the subset of input tokens to be preserved in the output, (2) re-ordering to define their order in the output text, and (3) insertion to infill the missing tokens that are not present in the input. The tagging and re-ordering steps, which are responsible for generating the largest portion of the output, are non-autoregressive, while the insertion uses an autoregressive decoder. Depending on the task, EdiT5 requires significantly fewer autoregressive steps demonstrating speedups of up to 25x when compared to classic seq2seq models. Quality-wise, EdiT5 is initialized with a pre-trained T5 checkpoint yielding comparable performance to T5 in high-resource settings and clearly outperforms it on low-resource settings when evaluated on three NLG tasks: Sentence Fusion, Grammatical Error Correction, and Decontextualization.

Related papers

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models [50.46453950887946]
This work introduces MrT5 (MergeT5), a more efficient variant of ByT5. MrT5 integrates a token deletion mechanism in its encoder to dynamically shorten the input sequence length. When trained on English text, MrT5 demonstrates the capability to transfer its deletion feature zero-shot across several languages.
arXiv Detail & Related papers (2024-10-28T06:14:12Z)
UT5: Pretraining Non autoregressive T5 with unrolled denoising [9.656399724144192]
We studied unsupervised pretraining for non auto-regressive T5 models via unrolled denoising. We showed its SoTA results in downstream generation tasks such as SQuAD question generation and XSum.
arXiv Detail & Related papers (2023-11-14T21:28:10Z)
Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing [56.232873134174056]
One of the major challenges in text-to-text parsing is domain generalization, i.e., how to well generalize to unseen databases. In this work, we explore ways to further augment the pre-trained text-to-text transformer model with specialized components for text-to-text parsing. To this end, we propose a new architecture GRAPHIX-T5, augmented by some specially-designed graph-aware model with layers.
arXiv Detail & Related papers (2023-01-18T13:29:05Z)
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding [74.12096349944497]
Aggressive Decoding is a novel decoding algorithm for seq2seq generation. Our approach aims to yield identical (or better) generation compared with autoregressive decoding. We test Aggressive Decoding on the most popular 6-layer Transformer model on GPU in multiple seq2seq tasks.
arXiv Detail & Related papers (2022-05-20T17:59:00Z)
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models [10.645591218689058]
We provide the first exploration of text-to-text transformers (T5) sentence embeddings. We investigate three methods for extracting T5 sentence embeddings. Our encoder-only models outperforms BERT-based sentence embeddings on both transfer tasks and semantic textual similarity.
arXiv Detail & Related papers (2021-08-19T18:58:02Z)
NT5?! Training T5 to Perform Numerical Reasoning [0.8827543048499855]
Numerical reasoning over text (NRoT) presents unique challenges that are not well addressed by existing pre-training objectives. We show that training the T5 multitasking framework with multiple numerical reasoning datasets of increasing difficulty can be achieved without manually engineering partitioned functionality.
arXiv Detail & Related papers (2021-04-15T08:34:44Z)
Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting [54.03356526990088]
We propose Sequence Span Rewriting (SSR) as a self-supervised sequence-to-sequence (seq2seq) pre-training objective. SSR provides more fine-grained learning signals for text representations by supervising the model to rewrite imperfect spans to ground truth. Our experiments with T5 models on various seq2seq tasks show that SSR can substantially improve seq2seq pre-training.
arXiv Detail & Related papers (2021-01-02T10:27:11Z)
Cascaded Text Generation with Markov Transformers [122.76100449018061]
Two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
arXiv Detail & Related papers (2020-06-01T17:52:15Z)
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.