Imputer: Sequence Modelling via Imputation and Dynamic Programming
- URL: http://arxiv.org/abs/2002.08926v2
- Date: Wed, 22 Apr 2020 17:32:18 GMT
- Title: Imputer: Sequence Modelling via Imputation and Dynamic Programming
- Authors: William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi,
Navdeep Jaitly
- Abstract summary: Imputer is an iterative generative model, requiring only a constant number of generation steps independent of the number of input or output tokens.
We present a tractable dynamic programming training algorithm, which yields a lower bound on the log marginal likelihood.
- Score: 101.5705527605346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents the Imputer, a neural sequence model that generates
output sequences iteratively via imputations. The Imputer is an iterative
generative model, requiring only a constant number of generation steps
independent of the number of input or output tokens. The Imputer can be trained
to approximately marginalize over all possible alignments between the input and
output sequences, and all possible generation orders. We present a tractable
dynamic programming training algorithm, which yields a lower bound on the log
marginal likelihood. When applied to end-to-end speech recognition, the Imputer
outperforms prior non-autoregressive models and achieves competitive results to
autoregressive models. On LibriSpeech test-other, the Imputer achieves 11.1
WER, outperforming CTC at 13.0 WER and seq2seq at 12.5 WER.
Related papers
- COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - Non-autoregressive Sequence-to-Sequence Vision-Language Models [63.77614880533488]
We propose a parallel decoding sequence-to-sequence vision-language model that marginalizes over multiple inference paths in the decoder.
The model achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time.
arXiv Detail & Related papers (2024-03-04T17:34:59Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z) - TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech
Recognition [69.68154370877615]
The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step.
To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT)
The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.
arXiv Detail & Related papers (2021-04-04T02:34:55Z) - Alleviate Exposure Bias in Sequence Prediction \\ with Recurrent Neural
Networks [47.52214243454995]
A popular strategy to train recurrent neural networks (RNNs) is to take the ground truth as input at each time step.
We propose a fully differentiable training algorithm for RNNs to better capture long-term dependencies.
arXiv Detail & Related papers (2021-03-22T06:15:22Z) - Align-Refine: Non-Autoregressive Speech Recognition via Iterative
Realignment [18.487842656780728]
Infilling and iterative refinement models make up some of this gap by editing the outputs of a non-autoregressive model.
We propose iterative realignment, where refinements occur over latent alignments rather than output sequence space.
arXiv Detail & Related papers (2020-10-24T09:35:37Z) - SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization [39.85688193525843]
We study a sequence-to-sequence setting with input sequence lengths up to 100,000 tokens and output sequence lengths up to 768 tokens.
We propose SEAL, a Transformer-based model, featuring a new encoder-decoder attention that dynamically extracts/selects input snippets to sparsely attend to for each output segment.
The SEAL model achieves state-of-the-art results on existing long-form summarization tasks, and outperforms strong baseline models on a new dataset/task we introduce, Search2Wiki, with much longer input text.
arXiv Detail & Related papers (2020-06-18T00:13:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.