DePA: Improving Non-autoregressive Machine Translation with
Dependency-Aware Decoder
- URL: http://arxiv.org/abs/2203.16266v2
- Date: Wed, 2 Aug 2023 06:13:35 GMT
- Title: DePA: Improving Non-autoregressive Machine Translation with
Dependency-Aware Decoder
- Authors: Jiaao Zhan, Qian Chen, Boxing Chen, Wen Wang, Yu Bai, Yang Gao
- Abstract summary: Non-autoregressive machine translation (NAT) models have lower translation quality than autoregressive translation (AT) models.
We propose a novel and general Dependency-Aware Decoder (DePA) to enhance target dependency modeling in the decoder of fully NAT models.
- Score: 32.18389249619327
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Non-autoregressive machine translation (NAT) models have lower translation
quality than autoregressive translation (AT) models because NAT decoders do not
depend on previous target tokens in the decoder input. We propose a novel and
general Dependency-Aware Decoder (DePA) to enhance target dependency modeling
in the decoder of fully NAT models from two perspectives: decoder
self-attention and decoder input. First, we propose an autoregressive
forward-backward pre-training phase before NAT training, which enables the NAT
decoder to gradually learn bidirectional target dependencies for the final NAT
training. Second, we transform the decoder input from the source language
representation space to the target language representation space through a
novel attentive transformation process, which enables the decoder to better
capture target dependencies. DePA can be applied to any fully NAT models.
Extensive experiments show that DePA consistently improves highly competitive
and state-of-the-art fully NAT models on widely used WMT and IWSLT benchmarks
by up to 1.88 BLEU gain, while maintaining the inference latency comparable to
other fully NAT models.
Related papers
- Non-autoregressive Machine Translation with Probabilistic Context-free
Grammar [30.423141482617957]
Non-autoregressive Transformer (NAT) significantly accelerates the inference of neural machine translation.
We propose PCFG-NAT, which leverages a specially designed Probabilistic Context-Free Grammar (PCFG) to enhance the ability of NAT models to capture complex dependencies.
arXiv Detail & Related papers (2023-11-14T06:39:04Z) - Revisiting Non-Autoregressive Translation at Scale [76.93869248715664]
We systematically study the impact of scaling on non-autoregressive translation (NAT) behaviors.
We show that scaling can alleviate the commonly-cited weaknesses of NAT models, resulting in better translation performance.
We establish a new benchmark by validating scaled NAT models on a scaled dataset.
arXiv Detail & Related papers (2023-05-25T15:22:47Z) - RenewNAT: Renewing Potential Translation for Non-Autoregressive
Transformer [15.616188012177538]
Non-autoregressive neural machine translation (NAT) models are proposed to accelerate the inference process while maintaining relatively high performance.
Existing NAT models are difficult to achieve the desired efficiency-quality trade-off.
We propose RenewNAT, a flexible framework with high efficiency and effectiveness.
arXiv Detail & Related papers (2023-03-14T07:10:03Z) - Sequence-Level Training for Non-Autoregressive Neural Machine
Translation [33.17341980163439]
Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant decoding speedup.
We propose using sequence-level training objectives to train NAT models, which evaluate the NAT outputs as a whole and correlates well with the real translation quality.
arXiv Detail & Related papers (2021-06-15T13:30:09Z) - Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade [47.97977478431973]
Fully non-autoregressive neural machine translation (NAT) is proposed to simultaneously predict tokens with single forward of neural networks.
In this work, we target on closing the performance gap while maintaining the latency advantage.
arXiv Detail & Related papers (2020-12-31T18:52:59Z) - Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine
Translation [32.77372312124259]
Non-Autoregressive machine Translation (NAT) models have demonstrated significant inference speedup but suffer from inferior translation accuracy.
We propose to adopt Multi-Task learning to transfer the Autoregressive machine Translation knowledge to NAT models through encoder sharing.
Experimental results on WMT14 English-German and WMT16 English-Romanian datasets show that the proposed Multi-Task NAT achieves significant improvements over the baseline NAT models.
arXiv Detail & Related papers (2020-10-24T11:00:58Z) - Task-Level Curriculum Learning for Non-Autoregressive Neural Machine
Translation [188.3605563567253]
Non-autoregressive translation (NAT) achieves faster inference speed but at the cost of worse accuracy compared with autoregressive translation (AT)
We introduce semi-autoregressive translation (SAT) as intermediate tasks. SAT covers AT and NAT as its special cases.
We design curriculum schedules to gradually shift k from 1 to N, with different pacing functions and number of tasks trained at the same time.
Experiments on IWSLT14 De-En, IWSLT16 En-De, WMT14 En-De and De-En datasets show that TCL-NAT achieves significant accuracy improvements over previous NAT baseline
arXiv Detail & Related papers (2020-07-17T06:06:54Z) - LAVA NAT: A Non-Autoregressive Translation Model with Look-Around
Decoding and Vocabulary Attention [54.18121922040521]
Non-autoregressive translation (NAT) models generate multiple tokens in one forward pass.
These NAT models often suffer from the multimodality problem, generating duplicated tokens or missing tokens.
We propose two novel methods to address this issue, the Look-Around (LA) strategy and the Vocabulary Attention (VA) mechanism.
arXiv Detail & Related papers (2020-02-08T04:11:03Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.