Cascaded Text Generation with Markov Transformers
- URL: http://arxiv.org/abs/2006.01112v2
- Date: Sat, 5 Dec 2020 05:26:19 GMT
- Title: Cascaded Text Generation with Markov Transformers
- Authors: Yuntian Deng, Alexander M. Rush
- Abstract summary: Two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies.
This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output.
This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
- Score: 122.76100449018061
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The two dominant approaches to neural text generation are fully
autoregressive models, using serial beam search decoding, and
non-autoregressive models, using parallel decoding with no output dependencies.
This work proposes an autoregressive model with sub-linear parallel time
generation. Noting that conditional random fields with bounded context can be
decoded in parallel, we propose an efficient cascaded decoding approach for
generating high-quality output. To parameterize this cascade, we introduce a
Markov transformer, a variant of the popular fully autoregressive model that
allows us to simultaneously decode with specific autoregressive context
cutoffs. This approach requires only a small modification from standard
autoregressive training, while showing competitive accuracy/speed tradeoff
compared to existing methods on five machine translation datasets.
Related papers
- Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding [60.188309982690335]
We propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation.
By introducing a probabilistic convergence criterion, our SJD accelerates the inference of auto-regressive text-to-image generation while maintaining the randomness in sampling-based token decoding.
arXiv Detail & Related papers (2024-10-02T16:05:27Z) - Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Hierarchical Attention Encoder Decoder [2.4366811507669115]
Autoregressive modeling can generate complex and novel sequences that have many real-world applications.
These models must generate outputs autoregressively, which becomes time-consuming when dealing with long sequences.
We propose a model based on the Hierarchical Recurrent Decoder architecture.
arXiv Detail & Related papers (2023-06-01T18:17:23Z) - Accelerating Transformer Inference for Translation via Parallel Decoding [2.89306442817912]
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT)
We present three parallel decoding algorithms and test them on different languages and models.
arXiv Detail & Related papers (2023-05-17T17:57:34Z) - Regression Transformer: Concurrent Conditional Generation and Regression
by Blending Numerical and Textual Tokens [3.421506449201873]
The Regression Transformer (RT) casts continuous properties as sequences of numerical tokens and encodes them jointly with conventional tokens.
We propose several extensions to the XLNet objective and adopt an alternating training scheme to concurrently optimize property prediction and conditional text generation.
This finds application particularly in property-driven, local exploration of the chemical or protein space.
arXiv Detail & Related papers (2022-02-01T08:57:31Z) - Non-Autoregressive Translation by Learning Target Categorical Codes [59.840510037250944]
We propose CNAT, which learns implicitly categorical codes as latent variables into the non-autoregressive decoding.
Experiment results show that our model achieves comparable or better performance in machine translation tasks.
arXiv Detail & Related papers (2021-03-21T14:12:34Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z) - Non-Autoregressive Machine Translation with Disentangled Context
Transformer [70.95181466892795]
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens.
We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts.
Our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average.
arXiv Detail & Related papers (2020-01-15T05:32:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.