Anticipation-free Training for Simultaneous Translation
- URL: http://arxiv.org/abs/2201.12868v1
- Date: Sun, 30 Jan 2022 16:29:37 GMT
- Title: Anticipation-free Training for Simultaneous Translation
- Authors: Chih-Chiang Chang, Shun-Po Chuang, Hung-yi Lee
- Abstract summary: Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
- Score: 70.85761141178597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous translation (SimulMT) speeds up the translation process by
starting to translate before the source sentence is completely available. It is
difficult due to limited context and word order difference between languages.
Existing methods increase latency or introduce adaptive read-write policies for
SimulMT models to handle local reordering and improve translation quality.
However, the long-distance reordering would make the SimulMT models learn
translation mistakenly. Specifically, the model may be forced to predict target
tokens when the corresponding source tokens have not been read. This leads to
aggressive anticipation during inference, resulting in the hallucination
phenomenon. To mitigate this problem, we propose a new framework that decompose
the translation process into the monotonic translation step and the reordering
step, and we model the latter by the auxiliary sorting network (ASN). The ASN
rearranges the hidden states to match the order in the target language, so that
the SimulMT model could learn to translate more reasonably. The entire model is
optimized end-to-end and does not rely on external aligners or data. During
inference, ASN is removed to achieve streaming. Experiments show the proposed
framework could outperform previous methods with less latency.\footnote{The
source code is available.
Related papers
- ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine
Translation [38.30649186517611]
This issue introduces an textbfunderlineAuto-textbfunderlineConstriction textbfunderlineTurning mechanism for textbfunderlineMultilingual textbfunderlineNeural textbfunderlineMachine textbfunderlineTranslation (model)
arXiv Detail & Related papers (2024-03-11T14:10:57Z) - Language Model is a Branch Predictor for Simultaneous Machine
Translation [73.82754138171587]
We propose incorporating branch prediction techniques in SiMT tasks to reduce translation latency.
We utilize a language model as a branch predictor to predict potential branch directions.
When the actual source word deviates from the predicted source word, we use the real source word to decode the output again, replacing the predicted output.
arXiv Detail & Related papers (2023-12-22T07:32:47Z) - CBSiMT: Mitigating Hallucination in Simultaneous Machine Translation
with Weighted Prefix-to-Prefix Training [13.462260072313894]
Simultaneous machine translation (SiMT) is a challenging task that requires starting translation before the full source sentence is available.
Prefix-to- framework is often applied to SiMT, which learns to predict target tokens using only a partial source prefix.
We propose a Confidence-Based Simultaneous Machine Translation framework, which uses model confidence to perceive hallucination tokens.
arXiv Detail & Related papers (2023-11-07T02:44:45Z) - Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing [12.843274390224853]
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks.
We show that they have yet to attain state-of-the-art performance in Neural Machine Translation.
We propose adapting LLM's as Automatic Post-Editors (APE) rather than direct translators.
arXiv Detail & Related papers (2023-10-23T12:22:15Z) - TIM: Teaching Large Language Models to Translate with Comparison [78.66926087162672]
We propose a novel framework using examples in comparison to teach LLMs to learn translation.
Our approach involves presenting the model with examples of correct and incorrect translations and using a preference loss to guide the model's learning.
Our findings offer a new perspective on fine-tuning LLMs for translation tasks and provide a promising solution for generating high-quality translations.
arXiv Detail & Related papers (2023-07-10T08:15:40Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Faster Re-translation Using Non-Autoregressive Model For Simultaneous
Neural Machine Translation [10.773010211146694]
We propose a faster re-translation system based on a non-autoregressive sequence generation model (FReTNA)
The proposed model reduces the average computation time by a factor of 20 when compared to the ReTA model.
It also outperforms the streaming-based Wait-k model both in terms of time (1.5 times lower) and translation quality.
arXiv Detail & Related papers (2020-12-29T09:43:27Z) - Learning Contextualized Sentence Representations for Document-Level
Neural Machine Translation [59.191079800436114]
Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence.
We propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence.
arXiv Detail & Related papers (2020-03-30T03:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.