Multilingual Bidirectional Unsupervised Translation Through Multilingual
Finetuning and Back-Translation
- URL: http://arxiv.org/abs/2209.02821v4
- Date: Mon, 3 Apr 2023 23:20:17 GMT
- Title: Multilingual Bidirectional Unsupervised Translation Through Multilingual
Finetuning and Back-Translation
- Authors: Bryan Li, Mohammad Sadegh Rasooli, Ajay Patel, Chris Callison-Burch
- Abstract summary: We propose a two-stage approach for training a single NMT model to translate unseen languages both to and from English.
For the first stage, we initialize an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform multilingual fine-tuning on parallel data in 40 languages to English.
For the second stage, we leverage this generalization ability to generate synthetic parallel data from monolingual datasets, then bidirectionally train with successive rounds of back-translation.
- Score: 23.401781865904386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a two-stage approach for training a single NMT model to translate
unseen languages both to and from English. For the first stage, we initialize
an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform
multilingual fine-tuning on parallel data in 40 languages to English. We find
this model can generalize to zero-shot translations on unseen languages. For
the second stage, we leverage this generalization ability to generate synthetic
parallel data from monolingual datasets, then bidirectionally train with
successive rounds of back-translation.
Our approach, which we EcXTra (English-centric Crosslingual (X) Transfer), is
conceptually simple, only using a standard cross-entropy objective throughout.
It is also data-driven, sequentially leveraging auxiliary parallel data and
monolingual data. We evaluate unsupervised NMT results for 7 low-resource
languages, and find that each round of back-translation training further
refines bidirectional performance. Our final single EcXTra-trained model
achieves competitive translation performance in all translation directions,
notably establishing a new state-of-the-art for English-to-Kazakh (22.9 > 10.4
BLEU). Our code is available at https://github.com/manestay/EcXTra .
Related papers
- Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural
Machine Translation [74.158365847236]
SixT++ is a strong many-to-English NMT model that supports 100 source languages but is trained once with a parallel dataset from only six source languages.
It significantly outperforms CRISS and m2m-100, two strong multilingual NMT systems, with an average gain of 7.2 and 5.0 BLEU respectively.
arXiv Detail & Related papers (2021-10-16T10:59:39Z) - Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally.
Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Complete Multilingual Neural Machine Translation [44.98358050355681]
We study the use of multi-way aligned examples to enrich the original English-centric parallel corpora.
We call MNMT with such connectivity pattern complete Multilingual Neural Machine Translation (cMNMT)
In combination with a novel training data sampling strategy that is conditioned on the target language only, cMNMT yields competitive translation quality for all language pairs.
arXiv Detail & Related papers (2020-10-20T13:03:48Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.