Revisiting Non-Autoregressive Translation at Scale
- URL: http://arxiv.org/abs/2305.16155v2
- Date: Fri, 2 Jun 2023 13:58:00 GMT
- Title: Revisiting Non-Autoregressive Translation at Scale
- Authors: Zhihao Wang, Longyue Wang, Jinsong Su, Junfeng Yao, Zhaopeng Tu
- Abstract summary: We systematically study the impact of scaling on non-autoregressive translation (NAT) behaviors.
We show that scaling can alleviate the commonly-cited weaknesses of NAT models, resulting in better translation performance.
We establish a new benchmark by validating scaled NAT models on a scaled dataset.
- Score: 76.93869248715664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In real-world systems, scaling has been critical for improving the
translation quality in autoregressive translation (AT), which however has not
been well studied for non-autoregressive translation (NAT). In this work, we
bridge the gap by systematically studying the impact of scaling on NAT
behaviors. Extensive experiments on six WMT benchmarks over two advanced NAT
models show that scaling can alleviate the commonly-cited weaknesses of NAT
models, resulting in better translation performance. To reduce the side-effect
of scaling on decoding speed, we empirically investigate the impact of NAT
encoder and decoder on the translation performance. Experimental results on the
large-scale WMT20 En-De show that the asymmetric architecture (e.g. bigger
encoder and smaller decoder) can achieve comparable performance with the
scaling model, while maintaining the superiority of decoding speed with
standard NAT models. To this end, we establish a new benchmark by validating
scaled NAT models on the scaled dataset, which can be regarded as a strong
baseline for future works. We release code and system outputs at
https://github.com/DeepLearnXMU/Scaling4NAT.
Related papers
- Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis [82.72941975704374]
Non-autoregressive Transformers (NATs) have been recognized for their rapid generation.
We re-evaluate the full potential of NATs by revisiting the design of their training and inference strategies.
We propose to go beyond existing methods by directly solving the optimal strategies in an automatic framework.
arXiv Detail & Related papers (2024-06-08T13:52:20Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Non-Autoregressive Document-Level Machine Translation [35.48195990457836]
Non-autoregressive translation (NAT) models achieve comparable performance and superior speed compared to auto-regressive translation (AT) models.
However, their abilities are unexplored in document-level machine translation (MT)
We propose a simple but effective design of sentence alignment between source and target.
arXiv Detail & Related papers (2023-05-22T09:59:59Z) - RenewNAT: Renewing Potential Translation for Non-Autoregressive
Transformer [15.616188012177538]
Non-autoregressive neural machine translation (NAT) models are proposed to accelerate the inference process while maintaining relatively high performance.
Existing NAT models are difficult to achieve the desired efficiency-quality trade-off.
We propose RenewNAT, a flexible framework with high efficiency and effectiveness.
arXiv Detail & Related papers (2023-03-14T07:10:03Z) - DePA: Improving Non-autoregressive Machine Translation with
Dependency-Aware Decoder [32.18389249619327]
Non-autoregressive machine translation (NAT) models have lower translation quality than autoregressive translation (AT) models.
We propose a novel and general Dependency-Aware Decoder (DePA) to enhance target dependency modeling in the decoder of fully NAT models.
arXiv Detail & Related papers (2022-03-30T12:53:20Z) - Sequence-Level Training for Non-Autoregressive Neural Machine
Translation [33.17341980163439]
Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant decoding speedup.
We propose using sequence-level training objectives to train NAT models, which evaluate the NAT outputs as a whole and correlates well with the real translation quality.
arXiv Detail & Related papers (2021-06-15T13:30:09Z) - Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade [47.97977478431973]
Fully non-autoregressive neural machine translation (NAT) is proposed to simultaneously predict tokens with single forward of neural networks.
In this work, we target on closing the performance gap while maintaining the latency advantage.
arXiv Detail & Related papers (2020-12-31T18:52:59Z) - Understanding and Improving Lexical Choice in Non-Autoregressive
Translation [98.11249019844281]
We propose to expose the raw data to NAT models to restore the useful information of low-frequency words.
Our approach pushes the SOTA NAT performance on the WMT14 English-German and WMT16 Romanian-English datasets up to 27.8 and 33.8 BLEU points, respectively.
arXiv Detail & Related papers (2020-12-29T03:18:50Z) - Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine
Translation [32.77372312124259]
Non-Autoregressive machine Translation (NAT) models have demonstrated significant inference speedup but suffer from inferior translation accuracy.
We propose to adopt Multi-Task learning to transfer the Autoregressive machine Translation knowledge to NAT models through encoder sharing.
Experimental results on WMT14 English-German and WMT16 English-Romanian datasets show that the proposed Multi-Task NAT achieves significant improvements over the baseline NAT models.
arXiv Detail & Related papers (2020-10-24T11:00:58Z) - Task-Level Curriculum Learning for Non-Autoregressive Neural Machine
Translation [188.3605563567253]
Non-autoregressive translation (NAT) achieves faster inference speed but at the cost of worse accuracy compared with autoregressive translation (AT)
We introduce semi-autoregressive translation (SAT) as intermediate tasks. SAT covers AT and NAT as its special cases.
We design curriculum schedules to gradually shift k from 1 to N, with different pacing functions and number of tasks trained at the same time.
Experiments on IWSLT14 De-En, IWSLT16 En-De, WMT14 En-De and De-En datasets show that TCL-NAT achieves significant accuracy improvements over previous NAT baseline
arXiv Detail & Related papers (2020-07-17T06:06:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.