Gaussian Multi-head Attention for Simultaneous Machine Translation
- URL: http://arxiv.org/abs/2203.09072v1
- Date: Thu, 17 Mar 2022 04:01:25 GMT
- Title: Gaussian Multi-head Attention for Simultaneous Machine Translation
- Authors: Shaolei Zhang, Yang Feng
- Abstract summary: Simultaneous machine translation (SiMT) outputs translation while receiving the streaming source inputs.
We propose a new SiMT policy by modeling alignment and translation in a unified manner.
Experiments on En-Vi and De-En tasks show that our method outperforms strong baselines on the trade-off between translation and latency.
- Score: 21.03142288187605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous machine translation (SiMT) outputs translation while receiving
the streaming source inputs, and hence needs a policy to determine where to
start translating. The alignment between target and source words often implies
the most informative source word for each target word, and hence provides the
unified control over translation quality and latency, but unfortunately the
existing SiMT methods do not explicitly model the alignment to perform the
control. In this paper, we propose Gaussian Multi-head Attention (GMA) to
develop a new SiMT policy by modeling alignment and translation in a unified
manner. For SiMT policy, GMA models the aligned source position of each target
word, and accordingly waits until its aligned position to start translating. To
integrate the learning of alignment into the translation model, a Gaussian
distribution centered on predicted aligned position is introduced as an
alignment-related prior, which cooperates with translation-related soft
attention to determine the final attention. Experiments on En-Vi and De-En
tasks show that our method outperforms strong baselines on the trade-off
between translation and latency.
Related papers
- Language Model is a Branch Predictor for Simultaneous Machine
Translation [73.82754138171587]
We propose incorporating branch prediction techniques in SiMT tasks to reduce translation latency.
We utilize a language model as a branch predictor to predict potential branch directions.
When the actual source word deviates from the predicted source word, we use the real source word to decode the output again, replacing the predicted output.
arXiv Detail & Related papers (2023-12-22T07:32:47Z) - Simultaneous Machine Translation with Tailored Reference [35.46823126036308]
Simultaneous machine translation (SiMT) generates translation while reading the whole source sentence.
Existing SiMT models are typically trained using the same reference disregarding the varying amounts of available source information at different latency.
We propose a novel method that provides tailored reference for the SiMT models trained at different latency by rephrasing the ground-truth.
arXiv Detail & Related papers (2023-10-20T15:32:26Z) - Principled Paraphrase Generation with Parallel Corpora [52.78059089341062]
We formalize the implicit similarity function induced by round-trip Machine Translation.
We show that it is susceptible to non-paraphrase pairs sharing a single ambiguous translation.
We design an alternative similarity metric that mitigates this issue.
arXiv Detail & Related papers (2022-05-24T17:22:42Z) - Reducing Position Bias in Simultaneous Machine Translation with
Length-Aware Framework [21.03142288187605]
Simultaneous machine translation (SiMT) starts translating while receiving the streaming source inputs.
We develop a Length-Aware Framework to reduce the position bias by bridging the structural gap between SiMT and full-sentence MT.
arXiv Detail & Related papers (2022-03-17T03:18:46Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Explicit Reordering for Neural Machine Translation [50.70683739103066]
In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency.
We propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT.
The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-04-08T05:28:46Z) - Learning Contextualized Sentence Representations for Document-Level
Neural Machine Translation [59.191079800436114]
Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence.
We propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence.
arXiv Detail & Related papers (2020-03-30T03:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.