Bilingual Synchronization: Restoring Translational Relationships with
Editing Operations
- URL: http://arxiv.org/abs/2210.13163v1
- Date: Mon, 24 Oct 2022 12:25:44 GMT
- Title: Bilingual Synchronization: Restoring Translational Relationships with
Editing Operations
- Authors: Jitao Xu, Josep Crego, Fran\c{c}ois Yvon
- Abstract summary: We consider a more general setting which assumes an initial target sequence, that must be transformed into a valid translation of the source.
Our results suggest that one single generic edit-based system, once fine-tuned, can compare with, or even outperform, dedicated systems specifically trained for these tasks.
- Score: 2.0411082897313984
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Translation (MT) is usually viewed as a one-shot process that
generates the target language equivalent of some source text from scratch. We
consider here a more general setting which assumes an initial target sequence,
that must be transformed into a valid translation of the source, thereby
restoring parallelism between source and target. For this bilingual
synchronization task, we consider several architectures (both autoregressive
and non-autoregressive) and training regimes, and experiment with multiple
practical settings such as simulated interactive MT, translating with
Translation Memory (TM) and TM cleaning. Our results suggest that one single
generic edit-based system, once fine-tuned, can compare with, or even
outperform, dedicated systems specifically trained for these tasks.
Related papers
- Shiftable Context: Addressing Training-Inference Context Mismatch in
Simultaneous Speech Translation [0.17188280334580192]
Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation.
We propose Shiftable Context to ensure consistent segment and context sizes are maintained throughout training and inference.
arXiv Detail & Related papers (2023-07-03T22:11:51Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - Neural Machine Translation with Contrastive Translation Memories [71.86990102704311]
Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios.
We propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence.
In training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence.
arXiv Detail & Related papers (2022-12-06T17:10:17Z) - Building Multilingual Machine Translation Systems That Serve Arbitrary
X-Y Translations [75.73028056136778]
We show how to practically build MNMT systems that serve arbitrary X-Y translation directions.
We also examine our proposed approach in an extremely large-scale data setting to accommodate practical deployment scenarios.
arXiv Detail & Related papers (2022-06-30T02:18:15Z) - Flow-Adapter Architecture for Unsupervised Machine Translation [0.3093890460224435]
We propose a flow-adapter architecture for unsupervised NMT.
We leverage normalizing flows to explicitly model the distributions of sentence-level latent representations.
This architecture allows for unsupervised training of each language independently.
arXiv Detail & Related papers (2022-04-26T11:00:32Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Multilingual Machine Translation Systems from Microsoft for WMT21 Shared
Task [95.06453182273027]
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation.
Our model submissions to the shared task were with DeltaLMnotefooturlhttps://aka.ms/deltalm, a generic pre-trained multilingual-decoder model.
Our final submissions ranked first on three tracks in terms of the automatic evaluation metric.
arXiv Detail & Related papers (2021-11-03T09:16:17Z) - SimulEval: An Evaluation Toolkit for Simultaneous Translation [59.02724214432792]
Simultaneous translation on both text and speech focuses on a real-time and low-latency scenario.
SimulEval is an easy-to-use and general evaluation toolkit for both simultaneous text and speech translation.
arXiv Detail & Related papers (2020-07-31T17:44:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.