Breaking Down Multilingual Machine Translation
- URL: http://arxiv.org/abs/2110.08130v1
- Date: Fri, 15 Oct 2021 14:57:12 GMT
- Title: Breaking Down Multilingual Machine Translation
- Authors: Ting-Rui Chiang, Yi-Pei Chen, Yi-Ting Yeh, Graham Neubig
- Abstract summary: We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs)
Our many-to-one models for high-resource languages and one-to-many models for LRLs outperform the best results reported by Aharoni et al.
- Score: 74.24795388967907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While multilingual training is now an essential ingredient in machine
translation (MT) systems, recent work has demonstrated that it has different
effects in different multilingual settings, such as many-to-one, one-to-many,
and many-to-many learning. These training settings expose the encoder and the
decoder in a machine translation model with different data distributions. In
this paper, we examine how different varieties of multilingual training
contribute to learning these two components of the MT model. Specifically, we
compare bilingual models with encoders and/or decoders initialized by
multilingual training. We show that multilingual training is beneficial to
encoders in general, while it only benefits decoders for low-resource languages
(LRLs). We further find the important attention heads for each language pair
and compare their correlations during inference. Our analysis sheds light on
how multilingual translation models work and also enables us to propose methods
to improve performance by training with highly related languages. Our
many-to-one models for high-resource languages and one-to-many models for LRL
outperform the best results reported by Aharoni et al. (2019).
Related papers
- Multilingual Large Language Models and Curse of Multilinguality [4.096453902709292]
Large Language Models (LLMs) have gained large popularity among Natural Language Processing (NLP) researchers and practitioners.
This paper navigates the landscape of multilingual LLMs, providing an introductory overview of their technical aspects.
It explains underlying architectures, objective functions, pre-training data sources, and tokenization methods.
arXiv Detail & Related papers (2024-06-15T11:31:39Z) - Multilingual Multimodal Learning with Machine Translated Text [27.7207234512674]
We investigate whether machine translating English multimodal data can be an effective proxy for the lack of readily available multilingual data.
We propose two metrics for automatically removing such translations from the resulting datasets.
In experiments on five tasks across 20 languages in the IGLUE benchmark, we show that translated data can provide a useful signal for multilingual multimodal learning.
arXiv Detail & Related papers (2022-10-24T11:41:20Z) - Bitext Mining Using Distilled Sentence Representations for Low-Resource
Languages [12.00637655338665]
We study very low-resource languages and handle 50 African languages, many of which are not covered by any other model.
We train sentence encoders, mine bitexts, and validate the bitexts by training NMT systems.
For these languages, we train sentence encoders, mine bitexts, and validate the bitexts by training NMT systems.
arXiv Detail & Related papers (2022-05-25T10:53:24Z) - Cross-Lingual Text Classification with Multilingual Distillation and
Zero-Shot-Aware Training [21.934439663979663]
Multi-branch multilingual language model (MBLM) built on Multilingual pre-trained language models (MPLMs)
Method based on transferring knowledge from high-performance monolingual models with a teacher-student framework.
Results on two cross-lingual classification tasks show that, with only the task's supervised data used, our method improves both the supervised and zero-shot performance of MPLMs.
arXiv Detail & Related papers (2022-02-28T09:51:32Z) - How Good is Your Tokenizer? On the Monolingual Performance of
Multilingual Language Models [96.32118305166412]
We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks.
We find that languages which are adequately represented in the multilingual model's vocabulary exhibit negligible performance decreases over their monolingual counterparts.
arXiv Detail & Related papers (2020-12-31T14:11:00Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z) - Balancing Training for Multilingual Neural Machine Translation [130.54253367251738]
multilingual machine translation (MT) models can translate to/from multiple languages.
Standard practice is to up-sample less resourced languages to increase representation.
We propose a method that instead automatically learns how to weight training data through a data scorer.
arXiv Detail & Related papers (2020-04-14T18:23:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.