Reducing Position Bias in Simultaneous Machine Translation with
Length-Aware Framework
- URL: http://arxiv.org/abs/2203.09053v1
- Date: Thu, 17 Mar 2022 03:18:46 GMT
- Title: Reducing Position Bias in Simultaneous Machine Translation with
Length-Aware Framework
- Authors: Shaolei Zhang, Yang Feng
- Abstract summary: Simultaneous machine translation (SiMT) starts translating while receiving the streaming source inputs.
We develop a Length-Aware Framework to reduce the position bias by bridging the structural gap between SiMT and full-sentence MT.
- Score: 21.03142288187605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous machine translation (SiMT) starts translating while receiving
the streaming source inputs, and hence the source sentence is always incomplete
during translating. Different from the full-sentence MT using the conventional
seq-to-seq architecture, SiMT often applies prefix-to-prefix architecture,
which forces each target word to only align with a partial source prefix to
adapt to the incomplete source in streaming inputs. However, the source words
in the front positions are always illusoryly considered more important since
they appear in more prefixes, resulting in position bias, which makes the model
pay more attention on the front source positions in testing. In this paper, we
first analyze the phenomenon of position bias in SiMT, and develop a
Length-Aware Framework to reduce the position bias by bridging the structural
gap between SiMT and full-sentence MT. Specifically, given the streaming
inputs, we first predict the full-sentence length and then fill the future
source position with positional encoding, thereby turning the streaming inputs
into a pseudo full-sentence. The proposed framework can be integrated into most
existing SiMT methods to further improve performance. Experiments on two
representative SiMT methods, including the state-of-the-art adaptive policy,
show that our method successfully reduces the position bias and achieves better
SiMT performance.
Related papers
- Decoder-only Streaming Transformer for Simultaneous Translation [31.558179590071973]
Simultaneous Machine Translation (SiMT) generates translation while reading source tokens, essentially producing the target prefix based on the source prefix.
We explore the potential of Decoder-only architecture, owing to its superior performance in various tasks and its inherent compatibility with SiMT.
We propose the first Decoder-only SiMT model, named Decoder-only Streaming Transformer (DST)
arXiv Detail & Related papers (2024-06-06T09:13:13Z) - Language Model is a Branch Predictor for Simultaneous Machine
Translation [73.82754138171587]
We propose incorporating branch prediction techniques in SiMT tasks to reduce translation latency.
We utilize a language model as a branch predictor to predict potential branch directions.
When the actual source word deviates from the predicted source word, we use the real source word to decode the output again, replacing the predicted output.
arXiv Detail & Related papers (2023-12-22T07:32:47Z) - CBSiMT: Mitigating Hallucination in Simultaneous Machine Translation
with Weighted Prefix-to-Prefix Training [13.462260072313894]
Simultaneous machine translation (SiMT) is a challenging task that requires starting translation before the full source sentence is available.
Prefix-to- framework is often applied to SiMT, which learns to predict target tokens using only a partial source prefix.
We propose a Confidence-Based Simultaneous Machine Translation framework, which uses model confidence to perceive hallucination tokens.
arXiv Detail & Related papers (2023-11-07T02:44:45Z) - Glancing Future for Simultaneous Machine Translation [35.46823126036308]
We propose a novel method to bridge the gap between the prefix2 training and seq2seq training.
We gradually reduce the available source information from the whole sentence to the prefix corresponding to that latency.
Our method is applicable to a wide range of SiMT methods and experiments demonstrate that our method outperforms strong baselines.
arXiv Detail & Related papers (2023-09-12T12:46:20Z) - Gaussian Multi-head Attention for Simultaneous Machine Translation [21.03142288187605]
Simultaneous machine translation (SiMT) outputs translation while receiving the streaming source inputs.
We propose a new SiMT policy by modeling alignment and translation in a unified manner.
Experiments on En-Vi and De-En tasks show that our method outperforms strong baselines on the trade-off between translation and latency.
arXiv Detail & Related papers (2022-03-17T04:01:25Z) - Source and Target Bidirectional Knowledge Distillation for End-to-end
Speech Translation [88.78138830698173]
We focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models.
We train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder.
arXiv Detail & Related papers (2021-04-13T19:00:51Z) - Enhanced Modality Transition for Image Captioning [51.72997126838352]
We build a Modality Transition Module (MTM) to transfer visual features into semantic representations before forwarding them to the language model.
During the training phase, the modality transition network is optimised by the proposed modality loss.
Experiments have been conducted on the MS-COCO dataset demonstrating the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-02-23T07:20:12Z) - Explicit Reordering for Neural Machine Translation [50.70683739103066]
In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency.
We propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT.
The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-04-08T05:28:46Z) - Learning Contextualized Sentence Representations for Document-Level
Neural Machine Translation [59.191079800436114]
Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence.
We propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence.
arXiv Detail & Related papers (2020-03-30T03:38:01Z) - Explicit Sentence Compression for Neural Machine Translation [110.98786673598016]
State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework.
backbone information, which stands for the gist of a sentence, is not specifically focused on.
We propose an explicit sentence compression method to enhance the source sentence representation for NMT.
arXiv Detail & Related papers (2019-12-27T04:14:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.