Related papers: Imputer: Sequence Modelling via Imputation and Dynamic Programming

Imputer: Sequence Modelling via Imputation and Dynamic Programming

URL: http://arxiv.org/abs/2002.08926v2
Date: Wed, 22 Apr 2020 17:32:18 GMT
Title: Imputer: Sequence Modelling via Imputation and Dynamic Programming
Authors: William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep Jaitly
Abstract summary: Imputer is an iterative generative model, requiring only a constant number of generation steps independent of the number of input or output tokens. We present a tractable dynamic programming training algorithm, which yields a lower bound on the log marginal likelihood.
Score: 101.5705527605346
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents the Imputer, a neural sequence model that generates output sequences iteratively via imputations. The Imputer is an iterative generative model, requiring only a constant number of generation steps independent of the number of input or output tokens. The Imputer can be trained to approximately marginalize over all possible alignments between the input and output sequences, and all possible generation orders. We present a tractable dynamic programming training algorithm, which yields a lower bound on the log marginal likelihood. When applied to end-to-end speech recognition, the Imputer outperforms prior non-autoregressive models and achieves competitive results to autoregressive models. On LibriSpeech test-other, the Imputer achieves 11.1 WER, outperforming CTC at 13.0 WER and seq2seq at 12.5 WER.

Related papers

Language Models Can Predict Their Own Behavior [28.80639362933004]
We show that internal representation of input tokens alone can often precisely predict, not just the next token, but eventual behavior over the entire output sequence. We leverage this capacity and learn probes on internal states to create early warning (and exit) systems. Specifically, if the probes can confidently estimate the way the LM is going to behave, then the system will avoid generating tokens altogether and return the estimated behavior instead.
arXiv Detail & Related papers (2025-02-18T23:13:16Z)
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z)
Learning Program Behavioral Models from Synthesized Input-Output Pairs [70.9524884086882]
We introduce Modelizer, a framework that learns a model from its input/output behavior using neural machine translation algorithms. Modelizer mocks the original program, achieving up to 95.4% accuracy and a BLEU score of 0.98 with standard error 0.04 in mocking real-world applications. We foresee several applications of these models, especially as the output of the program can be any aspect of program behavior.
arXiv Detail & Related papers (2024-07-11T15:25:02Z)
Non-autoregressive Sequence-to-Sequence Vision-Language Models [63.77614880533488]
We propose a parallel decoding sequence-to-sequence vision-language model that marginalizes over multiple inference paths in the decoder. The model achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time.
arXiv Detail & Related papers (2024-03-04T17:34:59Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer. It accurately predicts the number of output tokens and extract hidden variables. It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z)
TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition [69.68154370877615]
The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step. To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT) The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.
arXiv Detail & Related papers (2021-04-04T02:34:55Z)
Alleviate Exposure Bias in Sequence Prediction \\ with Recurrent Neural Networks [47.52214243454995]
A popular strategy to train recurrent neural networks (RNNs) is to take the ground truth as input at each time step. We propose a fully differentiable training algorithm for RNNs to better capture long-term dependencies.
arXiv Detail & Related papers (2021-03-22T06:15:22Z)
Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment [18.487842656780728]
Infilling and iterative refinement models make up some of this gap by editing the outputs of a non-autoregressive model. We propose iterative realignment, where refinements occur over latent alignments rather than output sequence space.
arXiv Detail & Related papers (2020-10-24T09:35:37Z)
SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization [39.85688193525843]
We study a sequence-to-sequence setting with input sequence lengths up to 100,000 tokens and output sequence lengths up to 768 tokens. We propose SEAL, a Transformer-based model, featuring a new encoder-decoder attention that dynamically extracts/selects input snippets to sparsely attend to for each output segment. The SEAL model achieves state-of-the-art results on existing long-form summarization tasks, and outperforms strong baseline models on a new dataset/task we introduce, Search2Wiki, with much longer input text.
arXiv Detail & Related papers (2020-06-18T00:13:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.