Learning to Recover from Multi-Modality Errors for Non-Autoregressive
Neural Machine Translation
- URL: http://arxiv.org/abs/2006.05165v1
- Date: Tue, 9 Jun 2020 10:12:16 GMT
- Title: Learning to Recover from Multi-Modality Errors for Non-Autoregressive
Neural Machine Translation
- Authors: Qiu Ran, Yankai Lin, Peng Li, Jie Zhou
- Abstract summary: Non-autoregressive neural machine translation (NAT) predicts the entire target sequence simultaneously and significantly accelerates inference process.
We propose a novel semi-autoregressive model RecoverSAT, which generates a translation as a sequence of segments.
By dynamically determining segment length and repetitive deleting segments, RecoverSAT is capable of recovering from repetitive and missing token errors.
Experimental results on three widely-used benchmark datasets show that our proposed model achieves more than 4$times$ speedup while maintaining comparable performance compared with the corresponding autoregressive model.
- Score: 38.123025955523836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive neural machine translation (NAT) predicts the entire
target sequence simultaneously and significantly accelerates inference process.
However, NAT discards the dependency information in a sentence, and thus
inevitably suffers from the multi-modality problem: the target tokens may be
provided by different possible translations, often causing token repetitions or
missing. To alleviate this problem, we propose a novel semi-autoregressive
model RecoverSAT in this work, which generates a translation as a sequence of
segments. The segments are generated simultaneously while each segment is
predicted token-by-token. By dynamically determining segment length and
deleting repetitive segments, RecoverSAT is capable of recovering from
repetitive and missing token errors. Experimental results on three widely-used
benchmark datasets show that our proposed model achieves more than 4$\times$
speedup while maintaining comparable performance compared with the
corresponding autoregressive model.
Related papers
- Non-autoregressive Sequence-to-Sequence Vision-Language Models [63.77614880533488]
We propose a parallel decoding sequence-to-sequence vision-language model that marginalizes over multiple inference paths in the decoder.
The model achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time.
arXiv Detail & Related papers (2024-03-04T17:34:59Z) - Towards Faster k-Nearest-Neighbor Machine Translation [51.866464707284635]
k-nearest-neighbor machine translation approaches suffer from heavy retrieve overhead on the entire datastore when decoding each token.
We propose a simple yet effective multi-layer perceptron (MLP) network to predict whether a token should be translated jointly by the neural machine translation model and probabilities produced by the kNN.
Our method significantly reduces the overhead of kNN retrievals by up to 53% at the expense of a slight decline in translation quality.
arXiv Detail & Related papers (2023-12-12T16:41:29Z) - RecycleGPT: An Autoregressive Language Model with Recyclable Module [13.243551482623623]
We present RecycleGPT, a generative language model with fast decoding speed.
Our approach relies on the observation that adjacent tokens in a sequence usually have strong correlations.
Experiments and analysis demonstrate the effectiveness of our approach in lowering inference latency, achieving up to 1.4x speedup.
arXiv Detail & Related papers (2023-08-07T09:14:33Z) - Mitigating the Learning Bias towards Repetition by Self-Contrastive
Training for Open-Ended Generation [92.42032403795879]
We show that pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts.
We attribute their overestimation of token-level repetition probabilities to the learning bias.
We find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.
arXiv Detail & Related papers (2023-07-04T07:53:55Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z) - Faster Re-translation Using Non-Autoregressive Model For Simultaneous
Neural Machine Translation [10.773010211146694]
We propose a faster re-translation system based on a non-autoregressive sequence generation model (FReTNA)
The proposed model reduces the average computation time by a factor of 20 when compared to the ReTA model.
It also outperforms the streaming-based Wait-k model both in terms of time (1.5 times lower) and translation quality.
arXiv Detail & Related papers (2020-12-29T09:43:27Z) - LAVA NAT: A Non-Autoregressive Translation Model with Look-Around
Decoding and Vocabulary Attention [54.18121922040521]
Non-autoregressive translation (NAT) models generate multiple tokens in one forward pass.
These NAT models often suffer from the multimodality problem, generating duplicated tokens or missing tokens.
We propose two novel methods to address this issue, the Look-Around (LA) strategy and the Vocabulary Attention (VA) mechanism.
arXiv Detail & Related papers (2020-02-08T04:11:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.