Context-Aware Cross-Attention for Non-Autoregressive Translation
- URL: http://arxiv.org/abs/2011.00770v1
- Date: Mon, 2 Nov 2020 06:34:33 GMT
- Title: Context-Aware Cross-Attention for Non-Autoregressive Translation
- Authors: Liang Ding, Longyue Wang, Di Wu, Dacheng Tao and Zhaopeng Tu
- Abstract summary: Non-autoregressive translation (NAT) significantly accelerates the inference process by predicting the entire target sequence.
Due to the lack of target dependency modelling in the decoder, the conditional generation process heavily depends on the cross-attention.
We propose to enhance signals of neighbour source tokens into conventional cross-attention.
- Score: 119.54611465693498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive translation (NAT) significantly accelerates the inference
process by predicting the entire target sequence. However, due to the lack of
target dependency modelling in the decoder, the conditional generation process
heavily depends on the cross-attention. In this paper, we reveal a localness
perception problem in NAT cross-attention, for which it is difficult to
adequately capture source context. To alleviate this problem, we propose to
enhance signals of neighbour source tokens into conventional cross-attention.
Experimental results on several representative datasets show that our approach
can consistently improve translation quality over strong NAT baselines.
Extensive analyses demonstrate that the enhanced cross-attention achieves
better exploitation of source contexts by leveraging both local and global
information.
Related papers
- Revisiting Non-Autoregressive Translation at Scale [76.93869248715664]
We systematically study the impact of scaling on non-autoregressive translation (NAT) behaviors.
We show that scaling can alleviate the commonly-cited weaknesses of NAT models, resulting in better translation performance.
We establish a new benchmark by validating scaled NAT models on a scaled dataset.
arXiv Detail & Related papers (2023-05-25T15:22:47Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation [0.0]
Non-autoregressive neural machine translation (NAT) offers substantial translation speed up compared to autoregressive neural machine translation (AT)
Latent variable modeling has emerged as a promising approach to bridge this quality gap.
arXiv Detail & Related papers (2023-05-02T15:33:09Z) - Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive
Machine Translation [18.205288788056787]
Non-autoregressive translation (NAT) reduces the decoding latency but suffers from performance degradation due to the multi-modality problem.
In this paper, we hold the view that all paths in the graph are fuzzily aligned with the reference sentence.
We do not require the exact alignment but train the model to maximize a fuzzy alignment score between the graph and reference, which takes translations captured in all modalities into account.
arXiv Detail & Related papers (2023-03-12T13:51:38Z) - Guided Image-to-Image Translation by Discriminator-Generator
Communication [71.86347329356244]
The goal of Image-to-image (I2I) translation is to transfer an image from a source domain to a target domain.
One major branch of this research is to formulate I2I translation based on Generative Adversarial Network (GAN)
arXiv Detail & Related papers (2023-03-07T02:29:36Z) - Zero-shot-Learning Cross-Modality Data Translation Through Mutual
Information Guided Stochastic Diffusion [5.795193288204816]
Cross-modality data translation has attracted great interest in image computing.
This paper proposes a new unsupervised zero-shot-learning method named Mutual Information Diffusion guided cross-modality data translation Model (MIDiffusion)
We empirically show the advanced performance of MIDiffusion in comparison with an influential group of generative models.
arXiv Detail & Related papers (2023-01-31T16:24:34Z) - Rephrasing the Reference for Non-Autoregressive Machine Translation [37.816198073720614]
Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem that there may exist multiple possible translations of a source sentence.
We introduce a rephraser to provide a better training target for NAT by rephrasing the reference sentence according to the NAT output.
Our best variant achieves comparable performance to the autoregressive Transformer, while being 14.7 times more efficient in inference.
arXiv Detail & Related papers (2022-11-30T10:05:03Z) - ConNER: Consistency Training for Cross-lingual Named Entity Recognition [96.84391089120847]
Cross-lingual named entity recognition suffers from data scarcity in the target languages.
We propose ConNER as a novel consistency training framework for cross-lingual NER.
arXiv Detail & Related papers (2022-11-17T07:57:54Z) - On the Learning of Non-Autoregressive Transformers [91.34196047466904]
Non-autoregressive Transformer (NAT) is a family of text generation models.
We present theoretical and empirical analyses to reveal the challenges of NAT learning.
arXiv Detail & Related papers (2022-06-13T08:42:09Z) - Modeling Coverage for Non-Autoregressive Neural Machine Translation [9.173385214565451]
We propose a novel Coverage-NAT to model the coverage information directly by a token-level coverage iterative refinement mechanism and a sentence-level coverage agreement.
Experimental results on WMT14 En-De and WMT16 En-Ro translation tasks show that our method can alleviate those errors and achieve strong improvements over the baseline system.
arXiv Detail & Related papers (2021-04-24T07:33:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.