LAVA NAT: A Non-Autoregressive Translation Model with Look-Around
Decoding and Vocabulary Attention
- URL: http://arxiv.org/abs/2002.03084v1
- Date: Sat, 8 Feb 2020 04:11:03 GMT
- Title: LAVA NAT: A Non-Autoregressive Translation Model with Look-Around
Decoding and Vocabulary Attention
- Authors: Xiaoya Li, Yuxian Meng, Arianna Yuan, Fei Wu, Jiwei Li
- Abstract summary: Non-autoregressive translation (NAT) models generate multiple tokens in one forward pass.
These NAT models often suffer from the multimodality problem, generating duplicated tokens or missing tokens.
We propose two novel methods to address this issue, the Look-Around (LA) strategy and the Vocabulary Attention (VA) mechanism.
- Score: 54.18121922040521
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive translation (NAT) models generate multiple tokens in one
forward pass and is highly efficient at inference stage compared with
autoregressive translation (AT) methods. However, NAT models often suffer from
the multimodality problem, i.e., generating duplicated tokens or missing
tokens. In this paper, we propose two novel methods to address this issue, the
Look-Around (LA) strategy and the Vocabulary Attention (VA) mechanism. The
Look-Around strategy predicts the neighbor tokens in order to predict the
current token, and the Vocabulary Attention models long-term token dependencies
inside the decoder by attending the whole vocabulary for each position to
acquire knowledge of which token is about to generate. %We also propose a
dynamic bidirectional decoding approach to accelerate the inference process of
the LAVA model while preserving the high-quality of the generated output. Our
proposed model uses significantly less time during inference compared with
autoregressive models and most other NAT models. Our experiments on four
benchmarks (WMT14 En$\rightarrow$De, WMT14 De$\rightarrow$En, WMT16
Ro$\rightarrow$En and IWSLT14 De$\rightarrow$En) show that the proposed model
achieves competitive performance compared with the state-of-the-art
non-autoregressive and autoregressive models while significantly reducing the
time cost in inference phase.
Related papers
- COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC [51.34222224728979]
This paper introduces a series of innovative techniques to enhance the translation quality of Non-Autoregressive Translation (NAT) models.
We propose fine-tuning Pretrained Multilingual Language Models (PMLMs) with the CTC loss to train NAT models effectively.
Our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.
arXiv Detail & Related papers (2023-06-10T05:24:29Z) - N-Gram Nearest Neighbor Machine Translation [101.25243884801183]
We propose a novel $n$-gram nearest neighbor retrieval method that is model agnostic and applicable to both Autoregressive Translation(AT) and Non-Autoregressive Translation(NAT) models.
We demonstrate that the proposed method consistently outperforms the token-level method on both AT and NAT models as well as on general as on domain adaptation translation tasks.
arXiv Detail & Related papers (2023-01-30T13:19:19Z) - Modeling Coverage for Non-Autoregressive Neural Machine Translation [9.173385214565451]
We propose a novel Coverage-NAT to model the coverage information directly by a token-level coverage iterative refinement mechanism and a sentence-level coverage agreement.
Experimental results on WMT14 En-De and WMT16 En-Ro translation tasks show that our method can alleviate those errors and achieve strong improvements over the baseline system.
arXiv Detail & Related papers (2021-04-24T07:33:23Z) - TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech
Recognition [69.68154370877615]
The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step.
To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT)
The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.
arXiv Detail & Related papers (2021-04-04T02:34:55Z) - Fast Sequence Generation with Multi-Agent Reinforcement Learning [40.75211414663022]
Non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel.
We propose a simple and efficient model for Non-Autoregressive sequence Generation (NAG) with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL)
On MSCOCO image captioning benchmark, our NAG method achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9x decoding speedup.
arXiv Detail & Related papers (2021-01-24T12:16:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.