Monotonic Simultaneous Translation with Chunk-wise Reordering and
Refinement
- URL: http://arxiv.org/abs/2110.09646v1
- Date: Mon, 18 Oct 2021 22:51:21 GMT
- Title: Monotonic Simultaneous Translation with Chunk-wise Reordering and
Refinement
- Authors: HyoJung Han, Seokchan Ahn, Yoonjung Choi, Insoo Chung, Sangha Kim,
Kyunghyun Cho
- Abstract summary: We propose an algorithm to reorder and refine the target side of a full sentence translation corpus.
The words/phrases between the source and target sentences are aligned largely monotonically, using word alignment and non-autoregressive neural machine translation.
The proposed approach improves BLEU scores and resulting translations exhibit enhanced monotonicity with source sentences.
- Score: 38.89496608319392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work in simultaneous machine translation is often trained with
conventional full sentence translation corpora, leading to either excessive
latency or necessity to anticipate as-yet-unarrived words, when dealing with a
language pair whose word orders significantly differ. This is unlike human
simultaneous interpreters who produce largely monotonic translations at the
expense of the grammaticality of a sentence being translated. In this paper, we
thus propose an algorithm to reorder and refine the target side of a full
sentence translation corpus, so that the words/phrases between the source and
target sentences are aligned largely monotonically, using word alignment and
non-autoregressive neural machine translation. We then train a widely used
wait-k simultaneous translation model on this reordered-and-refined corpus. The
proposed approach improves BLEU scores and resulting translations exhibit
enhanced monotonicity with source sentences.
Related papers
- Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for
Non-Autoregressive Machine Translation [51.06378042344563]
A new training oaxe loss has proven effective to ameliorate the effect of multimodality for non-autoregressive translation (NAT)
We extend oaxe by only allowing reordering between ngram phrases and still requiring a strict match of word order within the phrases.
Further analyses show that ngram-oaxe indeed improves the translation of ngram phrases, and produces more fluent translation with a better modeling of sentence structure.
arXiv Detail & Related papers (2022-10-08T11:39:15Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Simultaneous Neural Machine Translation with Constituent Label
Prediction [35.74159659906497]
Simultaneous translation is a task in which translation begins before the speaker has finished speaking.
We propose a couple of simple decision rules using the label of the next constituent predicted by incremental constituent label prediction.
In experiments on English-to-Japanese simultaneous translation, the proposed method outperformed baselines in the quality-latency trade-off.
arXiv Detail & Related papers (2021-10-26T08:23:20Z) - SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End
Simultaneous Speech Translation [23.685648804345984]
Simultaneous text translation and end-to-end speech translation have recently made great progress but little work has combined these tasks together.
We investigate how to adapt simultaneous text translation methods such as wait-k and monotonic multihead attention to end-to-end simultaneous speech translation by introducing a pre-decision module.
A detailed analysis is provided on the latency-quality trade-offs of combining fixed and flexible pre-decision with fixed and flexible policies.
arXiv Detail & Related papers (2020-11-03T22:47:58Z) - Improving Simultaneous Translation by Incorporating Pseudo-References
with Fewer Reorderings [24.997435410680378]
We propose a novel method that rewrites the target side of existing full-sentence corpora into simultaneous-style translation.
Experiments on Zh->En and Ja->En simultaneous translation show substantial improvements with the addition of these generated pseudo-references.
arXiv Detail & Related papers (2020-10-21T19:03:06Z) - Lexically Cohesive Neural Machine Translation with Copy Mechanism [21.43163704217968]
We employ a copy mechanism into a context-aware neural machine translation model to allow copying words from previous outputs.
We conduct experiments on Japanese to English translation using an evaluation dataset for discourse translation.
arXiv Detail & Related papers (2020-10-11T08:39:02Z) - Neural Syntactic Preordering for Controlled Paraphrase Generation [57.5316011554622]
Our work uses syntactic transformations to softly "reorder'' the source sentence and guide our neural paraphrasing model.
First, given an input sentence, we derive a set of feasible syntactic rearrangements using an encoder-decoder model.
Next, we use each proposed rearrangement to produce a sequence of position embeddings, which encourages our final encoder-decoder paraphrase model to attend to the source words in a particular order.
arXiv Detail & Related papers (2020-05-05T09:02:25Z) - Urdu-English Machine Transliteration using Neural Networks [0.0]
We present transliteration technique based on Expectation Maximization (EM) which is un-supervised and language independent.
System learns the pattern and out-of-vocabulary words from parallel corpus and there is no need to train it on transliteration corpus explicitly.
arXiv Detail & Related papers (2020-01-12T17:30:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.