Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural
Machine Translation
- URL: http://arxiv.org/abs/2110.08547v1
- Date: Sat, 16 Oct 2021 10:59:39 GMT
- Title: Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural
Machine Translation
- Authors: Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping
Wang, Furu Wei
- Abstract summary: SixT++ is a strong many-to-English NMT model that supports 100 source languages but is trained once with a parallel dataset from only six source languages.
It significantly outperforms CRISS and m2m-100, two strong multilingual NMT systems, with an average gain of 7.2 and 5.0 BLEU respectively.
- Score: 74.158365847236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper demonstrates that multilingual pretraining, a proper fine-tuning
method and a large-scale parallel dataset from multiple auxiliary languages are
all critical for zero-shot translation, where the NMT model is tested on source
languages unseen during supervised training. Following this idea, we present
SixT++, a strong many-to-English NMT model that supports 100 source languages
but is trained once with a parallel dataset from only six source languages.
SixT++ initializes the decoder embedding and the full encoder with XLM-R large,
and then trains the encoder and decoder layers with a simple two-stage training
strategy. SixT++ achieves impressive performance on many-to-English
translation. It significantly outperforms CRISS and m2m-100, two strong
multilingual NMT systems, with an average gain of 7.2 and 5.0 BLEU
respectively. Additionally, SixT++ offers a set of model parameters that can be
further fine-tuned to develop unsupervised NMT models for low-resource
languages. With back-translation on monolingual data of low-resource language,
it outperforms all current state-of-the-art unsupervised methods on Nepali and
Sinhal for both translating into and from English.
Related papers
- Multilingual Bidirectional Unsupervised Translation Through Multilingual
Finetuning and Back-Translation [23.401781865904386]
We propose a two-stage approach for training a single NMT model to translate unseen languages both to and from English.
For the first stage, we initialize an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform multilingual fine-tuning on parallel data in 40 languages to English.
For the second stage, we leverage this generalization ability to generate synthetic parallel data from monolingual datasets, then bidirectionally train with successive rounds of back-translation.
arXiv Detail & Related papers (2022-09-06T21:20:41Z) - Language Models are Good Translators [63.528370845657896]
We show that a single language model (LM4MT) can achieve comparable performance with strong encoder-decoder NMT models.
Experiments on pivot-based and zero-shot translation tasks show that LM4MT can outperform the encoder-decoder NMT model by a large margin.
arXiv Detail & Related papers (2021-06-25T13:30:29Z) - Zero-shot Cross-lingual Transfer of Neural Machine Translation with
Multilingual Pretrained Encoders [74.89326277221072]
How to improve the cross-lingual transfer of NMT model with multilingual pretrained encoder is under-explored.
We propose SixT, a simple yet effective model for this task.
Our model achieves better performance on many-to-English testsets than CRISS and m2m-100.
arXiv Detail & Related papers (2021-04-18T07:42:45Z) - Enabling Zero-shot Multilingual Spoken Language Translation with
Language-Specific Encoders and Decoders [5.050654565113709]
Current end-to-end approaches to Spoken Language Translation rely on limited training resources.
Our proposed method extends a MultiNMT architecture based on language-specific encoders-decoders to the task of Multilingual SLT.
arXiv Detail & Related papers (2020-11-02T16:31:14Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Reusing a Pretrained Language Model on Languages with Limited Corpora
for Unsupervised NMT [129.99918589405675]
We present an effective approach that reuses an LM that is pretrained only on the high-resource language.
The monolingual LM is fine-tuned on both languages and is then used to initialize a UNMT model.
Our approach, RE-LM, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk) and English-Albanian (En-Sq)
arXiv Detail & Related papers (2020-09-16T11:37:10Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.