Improving Simultaneous Translation by Incorporating Pseudo-References
with Fewer Reorderings
- URL: http://arxiv.org/abs/2010.11247v2
- Date: Thu, 23 Sep 2021 17:33:35 GMT
- Title: Improving Simultaneous Translation by Incorporating Pseudo-References
with Fewer Reorderings
- Authors: Junkun Chen, Renjie Zheng, Atsuhito Kita, Mingbo Ma, Liang Huang
- Abstract summary: We propose a novel method that rewrites the target side of existing full-sentence corpora into simultaneous-style translation.
Experiments on Zh->En and Ja->En simultaneous translation show substantial improvements with the addition of these generated pseudo-references.
- Score: 24.997435410680378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous translation is vastly different from full-sentence translation,
in the sense that it starts translation before the source sentence ends, with
only a few words delay. However, due to the lack of large-scale, high-quality
simultaneous translation datasets, most such systems are still trained on
conventional full-sentence bitexts. This is far from ideal for the simultaneous
scenario due to the abundance of unnecessary long-distance reorderings in those
bitexts. We propose a novel method that rewrites the target side of existing
full-sentence corpora into simultaneous-style translation. Experiments on
Zh->En and Ja->En simultaneous translation show substantial improvements (up to
+2.7 BLEU) with the addition of these generated pseudo-references.
Related papers
- Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Modeling Context With Linear Attention for Scalable Document-Level
Translation [72.41955536834702]
We investigate the efficacy of a recent linear attention model on document translation and augment it with a sentential gate to promote a recency inductive bias.
We show that sentential gating further improves translation quality on IWSLT.
arXiv Detail & Related papers (2022-10-16T03:41:50Z) - Data-Driven Adaptive Simultaneous Machine Translation [51.01779863078624]
We propose a novel and efficient training scheme for adaptive SimulMT.
Our method outperforms all strong baselines in terms of translation quality and latency.
arXiv Detail & Related papers (2022-04-27T02:40:21Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z) - BitextEdit: Automatic Bitext Editing for Improved Low-Resource Machine
Translation [53.55009917938002]
We propose to refine the mined bitexts via automatic editing.
Experiments demonstrate that our approach successfully improves the quality of CCMatrix mined bitext for 5 low-resource language-pairs and 10 translation directions by up to 8 BLEU points.
arXiv Detail & Related papers (2021-11-12T16:00:39Z) - Monotonic Simultaneous Translation with Chunk-wise Reordering and
Refinement [38.89496608319392]
We propose an algorithm to reorder and refine the target side of a full sentence translation corpus.
The words/phrases between the source and target sentences are aligned largely monotonically, using word alignment and non-autoregressive neural machine translation.
The proposed approach improves BLEU scores and resulting translations exhibit enhanced monotonicity with source sentences.
arXiv Detail & Related papers (2021-10-18T22:51:21Z) - Sentence Concatenation Approach to Data Augmentation for Neural Machine
Translation [22.316934668106526]
This study proposes a simple data augmentation method to handle long sentences.
We use only the given parallel corpora as the training data and generate long sentences by concatenating two sentences.
The translation quality is further improved by the proposed method, when combined with back-translation.
arXiv Detail & Related papers (2021-04-17T08:04:42Z) - SimulEval: An Evaluation Toolkit for Simultaneous Translation [59.02724214432792]
Simultaneous translation on both text and speech focuses on a real-time and low-latency scenario.
SimulEval is an easy-to-use and general evaluation toolkit for both simultaneous text and speech translation.
arXiv Detail & Related papers (2020-07-31T17:44:41Z) - Opportunistic Decoding with Timely Correction for Simultaneous
Translation [28.897290991945734]
We propose an opportunistic decoding technique with timely correction ability, which always (over-)generates a certain mount of extra words at each step to keep the audience on track with the latest information.
Experiments show our technique achieves substantial reduction in latency and up to +3.1 increase in BLEU, with revision rate under 8% in Chinese-to-English and English-to-Chinese translation.
arXiv Detail & Related papers (2020-05-02T01:41:02Z) - Re-translation versus Streaming for Simultaneous Translation [14.800214853561823]
We study a problem in which revisions to the hypothesis beyond strictly appending words are permitted.
In this setting, we compare custom streaming approaches to re-translation.
We find re-translation to be as good or better than state-of-the-art streaming systems.
arXiv Detail & Related papers (2020-04-07T18:27:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.