Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade
- URL: http://arxiv.org/abs/2012.15833v1
- Date: Thu, 31 Dec 2020 18:52:59 GMT
- Title: Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade
- Authors: Jiatao Gu, Xiang Kong
- Abstract summary: Fully non-autoregressive neural machine translation (NAT) is proposed to simultaneously predict tokens with single forward of neural networks.
In this work, we target on closing the performance gap while maintaining the latency advantage.
- Score: 47.97977478431973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fully non-autoregressive neural machine translation (NAT) is proposed to
simultaneously predict tokens with single forward of neural networks, which
significantly reduces the inference latency at the expense of quality drop
compared to the Transformer baseline. In this work, we target on closing the
performance gap while maintaining the latency advantage. We first inspect the
fundamental issues of fully NAT models, and adopt dependency reduction in the
learning space of output tokens as the basic guidance. Then, we revisit methods
in four different aspects that have been proven effective for improving NAT
models, and carefully combine these techniques with necessary modifications.
Our extensive experiments on three translation benchmarks show that the
proposed system achieves the new state-of-the-art results for fully NAT models,
and obtains comparable performance with the autoregressive and iterative NAT
systems. For instance, one of the proposed models achieves 27.49 BLEU points on
WMT14 En-De with approximately 16.5X speed up at inference time.
Related papers
- Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis [82.72941975704374]
Non-autoregressive Transformers (NATs) have been recognized for their rapid generation.
We re-evaluate the full potential of NATs by revisiting the design of their training and inference strategies.
We propose to go beyond existing methods by directly solving the optimal strategies in an automatic framework.
arXiv Detail & Related papers (2024-06-08T13:52:20Z) - Revisiting Non-Autoregressive Translation at Scale [76.93869248715664]
We systematically study the impact of scaling on non-autoregressive translation (NAT) behaviors.
We show that scaling can alleviate the commonly-cited weaknesses of NAT models, resulting in better translation performance.
We establish a new benchmark by validating scaled NAT models on a scaled dataset.
arXiv Detail & Related papers (2023-05-25T15:22:47Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - RenewNAT: Renewing Potential Translation for Non-Autoregressive
Transformer [15.616188012177538]
Non-autoregressive neural machine translation (NAT) models are proposed to accelerate the inference process while maintaining relatively high performance.
Existing NAT models are difficult to achieve the desired efficiency-quality trade-off.
We propose RenewNAT, a flexible framework with high efficiency and effectiveness.
arXiv Detail & Related papers (2023-03-14T07:10:03Z) - On the Learning of Non-Autoregressive Transformers [91.34196047466904]
Non-autoregressive Transformer (NAT) is a family of text generation models.
We present theoretical and empirical analyses to reveal the challenges of NAT learning.
arXiv Detail & Related papers (2022-06-13T08:42:09Z) - Sequence-Level Training for Non-Autoregressive Neural Machine
Translation [33.17341980163439]
Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant decoding speedup.
We propose using sequence-level training objectives to train NAT models, which evaluate the NAT outputs as a whole and correlates well with the real translation quality.
arXiv Detail & Related papers (2021-06-15T13:30:09Z) - Modeling Coverage for Non-Autoregressive Neural Machine Translation [9.173385214565451]
We propose a novel Coverage-NAT to model the coverage information directly by a token-level coverage iterative refinement mechanism and a sentence-level coverage agreement.
Experimental results on WMT14 En-De and WMT16 En-Ro translation tasks show that our method can alleviate those errors and achieve strong improvements over the baseline system.
arXiv Detail & Related papers (2021-04-24T07:33:23Z) - Task-Level Curriculum Learning for Non-Autoregressive Neural Machine
Translation [188.3605563567253]
Non-autoregressive translation (NAT) achieves faster inference speed but at the cost of worse accuracy compared with autoregressive translation (AT)
We introduce semi-autoregressive translation (SAT) as intermediate tasks. SAT covers AT and NAT as its special cases.
We design curriculum schedules to gradually shift k from 1 to N, with different pacing functions and number of tasks trained at the same time.
Experiments on IWSLT14 De-En, IWSLT16 En-De, WMT14 En-De and De-En datasets show that TCL-NAT achieves significant accuracy improvements over previous NAT baseline
arXiv Detail & Related papers (2020-07-17T06:06:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.