Incorporating a Local Translation Mechanism into Non-autoregressive
Translation
- URL: http://arxiv.org/abs/2011.06132v1
- Date: Thu, 12 Nov 2020 00:32:51 GMT
- Title: Incorporating a Local Translation Mechanism into Non-autoregressive
Translation
- Authors: Xiang Kong, Zhisong Zhang, Eduard Hovy
- Abstract summary: We introduce a novel local autoregressive translation mechanism into non-autoregressive translation (NAT) models.
For each target decoding position, instead of only one token, we predict a short sequence of tokens in an autoregressive way.
We design an efficient merging algorithm to align and merge the out-put pieces into one final output sequence.
- Score: 28.678752678905244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we introduce a novel local autoregressive translation (LAT)
mechanism into non-autoregressive translation (NAT) models so as to capture
local dependencies among tar-get outputs. Specifically, for each target
decoding position, instead of only one token, we predict a short sequence of
tokens in an autoregressive way. We further design an efficient merging
algorithm to align and merge the out-put pieces into one final output sequence.
We integrate LAT into the conditional masked language model (CMLM;
Ghazvininejad et al.,2019) and similarly adopt iterative decoding. Empirical
results on five translation tasks show that compared with CMLM, our method
achieves comparable or better performance with fewer decoding iterations,
bringing a 2.5xspeedup. Further analysis indicates that our method reduces
repeated translations and performs better at longer sentences.
Related papers
- Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous Translation [5.712277386555735]
Large language models (LLMs) have achieved state-of-the-art performance in various language processing tasks.
We propose SimulMask, a new paradigm for fine-tuning LLMs for simultaneous translation.
We have observed a significant translation quality improvement compared to state-of-the-art prompting optimization strategies on five language pairs.
arXiv Detail & Related papers (2024-05-16T21:07:42Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens [15.566726645722657]
We propose a novel framework specifically designed for speculative sampling.
Within this framework, we introduce a lightweight draft model that effectively utilizes previously generated tokens to predict subsequent words.
We demonstrate impressive results, achieving an average latency speedup ratio of 2.7x compared to the vanilla auto-regressive decoding approach.
arXiv Detail & Related papers (2024-02-24T08:10:39Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - Accelerating Transformer Inference for Translation via Parallel Decoding [2.89306442817912]
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT)
We present three parallel decoding algorithms and test them on different languages and models.
arXiv Detail & Related papers (2023-05-17T17:57:34Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Infusing Sequential Information into Conditional Masked Translation
Model with Self-Review Mechanism [9.641454891414751]
Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy.
We propose a Self-Review Mechanism to infuse sequential information into a conditional masked translation model.
Our model even surpasses a typical left-to-right Transformer model, while significantly speeding up decoding.
arXiv Detail & Related papers (2020-10-19T03:38:56Z) - Don't Parse, Insert: Multilingual Semantic Parsing with Insertion Based
Decoding [10.002379593718471]
A successful parse transforms an input utterance to an action that is easily understood by the system.
For complex parsing tasks, the state-of-the-art method is based on autoregressive sequence to sequence models to generate the parse directly.
arXiv Detail & Related papers (2020-10-08T01:18:42Z) - Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models.
With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z) - Non-Autoregressive Machine Translation with Disentangled Context
Transformer [70.95181466892795]
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens.
We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts.
Our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average.
arXiv Detail & Related papers (2020-01-15T05:32:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.