Related papers: Cascaded Text Generation with Markov Transformers

Cascaded Text Generation with Markov Transformers

URL: http://arxiv.org/abs/2006.01112v2
Date: Sat, 5 Dec 2020 05:26:19 GMT
Title: Cascaded Text Generation with Markov Transformers
Authors: Yuntian Deng, Alexander M. Rush
Abstract summary: Two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
Score: 122.76100449018061
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. To parameterize this cascade, we introduce a Markov transformer, a variant of the popular fully autoregressive model that allows us to simultaneously decode with specific autoregressive context cutoffs. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.

Related papers

Parallelized Autoregressive Visual Generation [65.9579525736345]
We propose a simple yet effective approach for parallelized autoregressive visual generation. Our method achieves a 3.6x speedup with comparable quality and up to 9.5x speedup with minimal quality degradation across both image and video generation tasks.
arXiv Detail & Related papers (2024-12-19T17:59:54Z)
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding [60.188309982690335]
We propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation. By introducing a probabilistic convergence criterion, our SJD accelerates the inference of auto-regressive text-to-image generation while maintaining the randomness in sampling-based token decoding.
arXiv Detail & Related papers (2024-10-02T16:05:27Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Hierarchical Attention Encoder Decoder [2.4366811507669115]
Autoregressive modeling can generate complex and novel sequences that have many real-world applications. These models must generate outputs autoregressively, which becomes time-consuming when dealing with long sequences. We propose a model based on the Hierarchical Recurrent Decoder architecture.
arXiv Detail & Related papers (2023-06-01T18:17:23Z)
Accelerating Transformer Inference for Translation via Parallel Decoding [2.89306442817912]
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT) We present three parallel decoding algorithms and test them on different languages and models.
arXiv Detail & Related papers (2023-05-17T17:57:34Z)
Regression Transformer: Concurrent Conditional Generation and Regression by Blending Numerical and Textual Tokens [3.421506449201873]
The Regression Transformer (RT) casts continuous properties as sequences of numerical tokens and encodes them jointly with conventional tokens. We propose several extensions to the XLNet objective and adopt an alternating training scheme to concurrently optimize property prediction and conditional text generation. This finds application particularly in property-driven, local exploration of the chemical or protein space.
arXiv Detail & Related papers (2022-02-01T08:57:31Z)
Non-Autoregressive Translation by Learning Target Categorical Codes [59.840510037250944]
We propose CNAT, which learns implicitly categorical codes as latent variables into the non-autoregressive decoding. Experiment results show that our model achieves comparable or better performance in machine translation tasks.
arXiv Detail & Related papers (2021-03-21T14:12:34Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
Non-Autoregressive Machine Translation with Disentangled Context Transformer [70.95181466892795]
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts. Our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average.
arXiv Detail & Related papers (2020-01-15T05:32:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.