Modeling Coverage for Non-Autoregressive Neural Machine Translation
- URL: http://arxiv.org/abs/2104.11897v1
- Date: Sat, 24 Apr 2021 07:33:23 GMT
- Title: Modeling Coverage for Non-Autoregressive Neural Machine Translation
- Authors: Yong Shan, Yang Feng, Chenze Shao
- Abstract summary: We propose a novel Coverage-NAT to model the coverage information directly by a token-level coverage iterative refinement mechanism and a sentence-level coverage agreement.
Experimental results on WMT14 En-De and WMT16 En-Ro translation tasks show that our method can alleviate those errors and achieve strong improvements over the baseline system.
- Score: 9.173385214565451
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-Autoregressive Neural Machine Translation (NAT) has achieved significant
inference speedup by generating all tokens simultaneously. Despite its high
efficiency, NAT usually suffers from two kinds of translation errors:
over-translation (e.g. repeated tokens) and under-translation (e.g. missing
translations), which eventually limits the translation quality. In this paper,
we argue that these issues of NAT can be addressed through coverage modeling,
which has been proved to be useful in autoregressive decoding. We propose a
novel Coverage-NAT to model the coverage information directly by a token-level
coverage iterative refinement mechanism and a sentence-level coverage
agreement, which can remind the model if a source token has been translated or
not and improve the semantics consistency between the translation and the
source, respectively. Experimental results on WMT14 En-De and WMT16 En-Ro
translation tasks show that our method can alleviate those errors and achieve
strong improvements over the baseline system.
Related papers
- DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation [29.76274107159478]
Non-autoregressive Transformers (NATs) are applied in direct speech-to-speech translation systems.
We introduce DiffNorm, a diffusion-based normalization strategy that simplifies data distributions for training NAT models.
Our strategies result in a notable improvement of about +7 ASR-BLEU for English-Spanish (En-Es) and +2 ASR-BLEU for English-French (En-Fr) on the CVSS benchmark.
arXiv Detail & Related papers (2024-05-22T01:10:39Z) - Revisiting Non-Autoregressive Translation at Scale [76.93869248715664]
We systematically study the impact of scaling on non-autoregressive translation (NAT) behaviors.
We show that scaling can alleviate the commonly-cited weaknesses of NAT models, resulting in better translation performance.
We establish a new benchmark by validating scaled NAT models on a scaled dataset.
arXiv Detail & Related papers (2023-05-25T15:22:47Z) - Non-Autoregressive Document-Level Machine Translation [35.48195990457836]
Non-autoregressive translation (NAT) models achieve comparable performance and superior speed compared to auto-regressive translation (AT) models.
However, their abilities are unexplored in document-level machine translation (MT)
We propose a simple but effective design of sentence alignment between source and target.
arXiv Detail & Related papers (2023-05-22T09:59:59Z) - Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation [0.0]
Non-autoregressive neural machine translation (NAT) offers substantial translation speed up compared to autoregressive neural machine translation (AT)
Latent variable modeling has emerged as a promising approach to bridge this quality gap.
arXiv Detail & Related papers (2023-05-02T15:33:09Z) - TransFool: An Adversarial Attack against Neural Machine Translation
Models [49.50163349643615]
We investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool.
We generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples.
Based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks.
arXiv Detail & Related papers (2023-02-02T08:35:34Z) - Candidate Soups: Fusing Candidate Results Improves Translation Quality
for Non-Autoregressive Translation [15.332496335303189]
Non-autoregressive translation (NAT) model achieves a much faster inference speed than the autoregressive translation (AT) model.
Existing NAT methods only focus on improving the NAT model's performance but do not fully utilize it.
We propose a simple but effective method called "Candidate Soups," which can obtain high-quality translations.
arXiv Detail & Related papers (2023-01-27T02:39:42Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade [47.97977478431973]
Fully non-autoregressive neural machine translation (NAT) is proposed to simultaneously predict tokens with single forward of neural networks.
In this work, we target on closing the performance gap while maintaining the latency advantage.
arXiv Detail & Related papers (2020-12-31T18:52:59Z) - Understanding and Improving Lexical Choice in Non-Autoregressive
Translation [98.11249019844281]
We propose to expose the raw data to NAT models to restore the useful information of low-frequency words.
Our approach pushes the SOTA NAT performance on the WMT14 English-German and WMT16 Romanian-English datasets up to 27.8 and 33.8 BLEU points, respectively.
arXiv Detail & Related papers (2020-12-29T03:18:50Z) - LAVA NAT: A Non-Autoregressive Translation Model with Look-Around
Decoding and Vocabulary Attention [54.18121922040521]
Non-autoregressive translation (NAT) models generate multiple tokens in one forward pass.
These NAT models often suffer from the multimodality problem, generating duplicated tokens or missing tokens.
We propose two novel methods to address this issue, the Look-Around (LA) strategy and the Vocabulary Attention (VA) mechanism.
arXiv Detail & Related papers (2020-02-08T04:11:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.