Zero-shot Cross-lingual Transfer of Neural Machine Translation with
Multilingual Pretrained Encoders
- URL: http://arxiv.org/abs/2104.08757v1
- Date: Sun, 18 Apr 2021 07:42:45 GMT
- Title: Zero-shot Cross-lingual Transfer of Neural Machine Translation with
Multilingual Pretrained Encoders
- Authors: Guanhua Chen, Shuming Ma, Yun Chen, Li Dong, Dongdong Zhang, Jia Pan,
Wenping Wang, Furu Wei
- Abstract summary: How to improve the cross-lingual transfer of NMT model with multilingual pretrained encoder is under-explored.
We propose SixT, a simple yet effective model for this task.
Our model achieves better performance on many-to-English testsets than CRISS and m2m-100.
- Score: 74.89326277221072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous works mainly focus on improving cross-lingual transfer for NLU tasks
with multilingual pretrained encoder (MPE), or improving the translation
performance on NMT task with BERT. However, how to improve the cross-lingual
transfer of NMT model with multilingual pretrained encoder is under-explored.
In this paper, we focus on a zero-shot cross-lingual transfer task in NMT. In
this task, the NMT model is trained with one parallel dataset and an
off-the-shelf MPE, then is directly tested on zero-shot language pairs. We
propose SixT, a simple yet effective model for this task. The SixT model
leverages the MPE with a two-stage training schedule and gets further
improvement with a position disentangled encoder and a capacity-enhanced
decoder. The extensive experiments prove that SixT significantly improves the
translation quality of the unseen languages. With much less computation cost
and training data, our model achieves better performance on many-to-English
testsets than CRISS and m2m-100, two strong multilingual NMT baselines.
Related papers
- EMMeTT: Efficient Multimodal Machine Translation Training [26.295981183965566]
We propose a joint multimodal training regime of Speech-LLM to include automatic speech translation (AST)
To handle joint multimodal training, we propose a novel training framework called EMMeTT.
The resultant Multimodal Translation Model produces strong text and speech translation results at the same time.
arXiv Detail & Related papers (2024-09-20T14:03:23Z) - Direct Neural Machine Translation with Task-level Mixture of Experts models [1.2338729811609357]
Direct neural machine translation (direct NMT) translates text between two non-English languages.
Task-level Mixture of expert models (Task-level MoE) has shown promising NMT performance for a large number of language pairs.
arXiv Detail & Related papers (2023-10-18T18:19:45Z) - Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer [92.80671770992572]
Cross-lingual transfer is a central task in multilingual NLP.
Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data.
We propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2023-09-19T19:30:56Z) - Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval [66.69799641522133]
State-of-the-art neural (re)rankers are notoriously data hungry.
Current approaches typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders.
We show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer.
arXiv Detail & Related papers (2022-04-05T15:44:27Z) - Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural
Machine Translation [74.158365847236]
SixT++ is a strong many-to-English NMT model that supports 100 source languages but is trained once with a parallel dataset from only six source languages.
It significantly outperforms CRISS and m2m-100, two strong multilingual NMT systems, with an average gain of 7.2 and 5.0 BLEU respectively.
arXiv Detail & Related papers (2021-10-16T10:59:39Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Multi-task Learning for Multilingual Neural Machine Translation [32.81785430242313]
We propose a multi-task learning framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data.
We show that the proposed approach can effectively improve the translation quality for both high-resource and low-resource languages.
arXiv Detail & Related papers (2020-10-06T06:54:12Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.