Non-autoregressive Machine Translation with Probabilistic Context-free
Grammar
- URL: http://arxiv.org/abs/2311.07941v1
- Date: Tue, 14 Nov 2023 06:39:04 GMT
- Title: Non-autoregressive Machine Translation with Probabilistic Context-free
Grammar
- Authors: Shangtong Gui, Chenze Shao, Zhengrui Ma, Xishan Zhang, Yunji Chen,
Yang Feng
- Abstract summary: Non-autoregressive Transformer (NAT) significantly accelerates the inference of neural machine translation.
We propose PCFG-NAT, which leverages a specially designed Probabilistic Context-Free Grammar (PCFG) to enhance the ability of NAT models to capture complex dependencies.
- Score: 30.423141482617957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive Transformer(NAT) significantly accelerates the inference
of neural machine translation. However, conventional NAT models suffer from
limited expression power and performance degradation compared to autoregressive
(AT) models due to the assumption of conditional independence among target
tokens. To address these limitations, we propose a novel approach called
PCFG-NAT, which leverages a specially designed Probabilistic Context-Free
Grammar (PCFG) to enhance the ability of NAT models to capture complex
dependencies among output tokens. Experimental results on major machine
translation benchmarks demonstrate that PCFG-NAT further narrows the gap in
translation quality between NAT and AT models. Moreover, PCFG-NAT facilitates a
deeper understanding of the generated sentences, addressing the lack of
satisfactory explainability in neural machine translation.Code is publicly
available at https://github.com/ictnlp/PCFG-NAT.
Related papers
- Revisiting Non-Autoregressive Translation at Scale [76.93869248715664]
We systematically study the impact of scaling on non-autoregressive translation (NAT) behaviors.
We show that scaling can alleviate the commonly-cited weaknesses of NAT models, resulting in better translation performance.
We establish a new benchmark by validating scaled NAT models on a scaled dataset.
arXiv Detail & Related papers (2023-05-25T15:22:47Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Non-Autoregressive Document-Level Machine Translation [35.48195990457836]
Non-autoregressive translation (NAT) models achieve comparable performance and superior speed compared to auto-regressive translation (AT) models.
However, their abilities are unexplored in document-level machine translation (MT)
We propose a simple but effective design of sentence alignment between source and target.
arXiv Detail & Related papers (2023-05-22T09:59:59Z) - Rephrasing the Reference for Non-Autoregressive Machine Translation [37.816198073720614]
Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem that there may exist multiple possible translations of a source sentence.
We introduce a rephraser to provide a better training target for NAT by rephrasing the reference sentence according to the NAT output.
Our best variant achieves comparable performance to the autoregressive Transformer, while being 14.7 times more efficient in inference.
arXiv Detail & Related papers (2022-11-30T10:05:03Z) - On the Learning of Non-Autoregressive Transformers [91.34196047466904]
Non-autoregressive Transformer (NAT) is a family of text generation models.
We present theoretical and empirical analyses to reveal the challenges of NAT learning.
arXiv Detail & Related papers (2022-06-13T08:42:09Z) - Directed Acyclic Transformer for Non-Autoregressive Machine Translation [93.31114105366461]
Directed Acyclic Transfomer (DA-Transformer) represents hidden states in a Directed Acyclic Graph (DAG)
DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average.
arXiv Detail & Related papers (2022-05-16T06:02:29Z) - Sequence-Level Training for Non-Autoregressive Neural Machine
Translation [33.17341980163439]
Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant decoding speedup.
We propose using sequence-level training objectives to train NAT models, which evaluate the NAT outputs as a whole and correlates well with the real translation quality.
arXiv Detail & Related papers (2021-06-15T13:30:09Z) - Modeling Coverage for Non-Autoregressive Neural Machine Translation [9.173385214565451]
We propose a novel Coverage-NAT to model the coverage information directly by a token-level coverage iterative refinement mechanism and a sentence-level coverage agreement.
Experimental results on WMT14 En-De and WMT16 En-Ro translation tasks show that our method can alleviate those errors and achieve strong improvements over the baseline system.
arXiv Detail & Related papers (2021-04-24T07:33:23Z) - Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade [47.97977478431973]
Fully non-autoregressive neural machine translation (NAT) is proposed to simultaneously predict tokens with single forward of neural networks.
In this work, we target on closing the performance gap while maintaining the latency advantage.
arXiv Detail & Related papers (2020-12-31T18:52:59Z) - Task-Level Curriculum Learning for Non-Autoregressive Neural Machine
Translation [188.3605563567253]
Non-autoregressive translation (NAT) achieves faster inference speed but at the cost of worse accuracy compared with autoregressive translation (AT)
We introduce semi-autoregressive translation (SAT) as intermediate tasks. SAT covers AT and NAT as its special cases.
We design curriculum schedules to gradually shift k from 1 to N, with different pacing functions and number of tasks trained at the same time.
Experiments on IWSLT14 De-En, IWSLT16 En-De, WMT14 En-De and De-En datasets show that TCL-NAT achieves significant accuracy improvements over previous NAT baseline
arXiv Detail & Related papers (2020-07-17T06:06:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.