Related papers: Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive Machine Translation

Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive Machine Translation

URL: http://arxiv.org/abs/2303.06662v2
Date: Mon, 17 Jul 2023 07:21:50 GMT
Title: Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive Machine Translation
Authors: Zhengrui Ma, Chenze Shao, Shangtong Gui, Min Zhang and Yang Feng
Abstract summary: Non-autoregressive translation (NAT) reduces the decoding latency but suffers from performance degradation due to the multi-modality problem. In this paper, we hold the view that all paths in the graph are fuzzily aligned with the reference sentence. We do not require the exact alignment but train the model to maximize a fuzzy alignment score between the graph and reference, which takes translations captured in all modalities into account.
Score: 18.205288788056787
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Non-autoregressive translation (NAT) reduces the decoding latency but suffers from performance degradation due to the multi-modality problem. Recently, the structure of directed acyclic graph has achieved great success in NAT, which tackles the multi-modality problem by introducing dependency between vertices. However, training it with negative log-likelihood loss implicitly requires a strict alignment between reference tokens and vertices, weakening its ability to handle multiple translation modalities. In this paper, we hold the view that all paths in the graph are fuzzily aligned with the reference sentence. We do not require the exact alignment but train the model to maximize a fuzzy alignment score between the graph and reference, which takes captured translations in all modalities into account. Extensive experiments on major WMT benchmarks show that our method substantially improves translation performance and increases prediction confidence, setting a new state of the art for NAT on the raw training data.

Related papers

Revisiting Non-Autoregressive Translation at Scale [76.93869248715664]
We systematically study the impact of scaling on non-autoregressive translation (NAT) behaviors. We show that scaling can alleviate the commonly-cited weaknesses of NAT models, resulting in better translation performance. We establish a new benchmark by validating scaled NAT models on a scaled dataset.
arXiv Detail & Related papers (2023-05-25T15:22:47Z)
Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order. In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z)
Rephrasing the Reference for Non-Autoregressive Machine Translation [37.816198073720614]
Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem that there may exist multiple possible translations of a source sentence. We introduce a rephraser to provide a better training target for NAT by rephrasing the reference sentence according to the NAT output. Our best variant achieves comparable performance to the autoregressive Transformer, while being 14.7 times more efficient in inference.
arXiv Detail & Related papers (2022-11-30T10:05:03Z)
ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation [51.06378042344563]
A new training oaxe loss has proven effective to ameliorate the effect of multimodality for non-autoregressive translation (NAT) We extend oaxe by only allowing reordering between ngram phrases and still requiring a strict match of word order within the phrases. Further analyses show that ngram-oaxe indeed improves the translation of ngram phrases, and produces more fluent translation with a better modeling of sentence structure.
arXiv Detail & Related papers (2022-10-08T11:39:15Z)
Directed Acyclic Transformer for Non-Autoregressive Machine Translation [93.31114105366461]
Directed Acyclic Transfomer (DA-Transformer) represents hidden states in a Directed Acyclic Graph (DAG) DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average.
arXiv Detail & Related papers (2022-05-16T06:02:29Z)
Improving Multilingual Translation by Representation and Gradient Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z)
Sequence-Level Training for Non-Autoregressive Neural Machine Translation [33.17341980163439]
Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant decoding speedup. We propose using sequence-level training objectives to train NAT models, which evaluate the NAT outputs as a whole and correlates well with the real translation quality.
arXiv Detail & Related papers (2021-06-15T13:30:09Z)
Modeling Coverage for Non-Autoregressive Neural Machine Translation [9.173385214565451]
We propose a novel Coverage-NAT to model the coverage information directly by a token-level coverage iterative refinement mechanism and a sentence-level coverage agreement. Experimental results on WMT14 En-De and WMT16 En-Ro translation tasks show that our method can alleviate those errors and achieve strong improvements over the baseline system.
arXiv Detail & Related papers (2021-04-24T07:33:23Z)
Context-Aware Cross-Attention for Non-Autoregressive Translation [119.54611465693498]
Non-autoregressive translation (NAT) significantly accelerates the inference process by predicting the entire target sequence. Due to the lack of target dependency modelling in the decoder, the conditional generation process heavily depends on the cross-attention. We propose to enhance signals of neighbour source tokens into conventional cross-attention.
arXiv Detail & Related papers (2020-11-02T06:34:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.