Lego-MT: Learning Detachable Models for Massively Multilingual Machine
Translation
- URL: http://arxiv.org/abs/2212.10551v3
- Date: Wed, 19 Jul 2023 05:52:32 GMT
- Title: Lego-MT: Learning Detachable Models for Massively Multilingual Machine
Translation
- Authors: Fei Yuan, Yinquan Lu, WenHao Zhu, Lingpeng Kong, Lei Li, Yu Qiao,
Jingjing Xu
- Abstract summary: We propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT.
Experiments show that Lego-MT with 1.2B parameters brings an average gain of 3.2 spBLEU.
The proposed training recipe brings a 28.2$times$ speedup over the conventional multi-way training method.
- Score: 48.37939354609931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual neural machine translation (MNMT) aims to build a unified model
for many language directions. Existing monolithic models for MNMT encounter two
challenges: parameter interference among languages and inefficient inference
for large models. In this paper, we revisit the classic multi-way structures
and develop a detachable model by assigning each language (or group of
languages) to an individual branch that supports plug-and-play training and
inference. To address the needs of learning representations for all languages
in a unified space, we propose a novel efficient training recipe, upon which we
build an effective detachable model, Lego-MT. For a fair comparison, we collect
data from OPUS and build a translation benchmark covering 433 languages and
1.3B parallel data. Experiments show that Lego-MT with 1.2B parameters brings
an average gain of 3.2 spBLEU. It even outperforms M2M-100 with 12B parameters.
The proposed training recipe brings a 28.2$\times$ speedup over the
conventional multi-way training method.\footnote{
\url{https://github.com/CONE-MT/Lego-MT}.}
Related papers
- Machine Translation for Ge'ez Language [0.0]
Machine translation for low-resource languages such as Ge'ez faces challenges such as out-of-vocabulary words, domain mismatches, and lack of labeled training data.
We develop a multilingual neural machine translation (MNMT) model based on languages relatedness.
We also experiment with using GPT-3.5, a state-of-the-art LLM, for few-shot translation with fuzzy matches.
arXiv Detail & Related papers (2023-11-24T14:55:23Z) - Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple
Tasks [77.90900650816046]
We introduce $textZemi$, a zero-shot semi-parametric language model.
We train $textZemi$ with a novel semi-parametric multitask prompted training paradigm.
Specifically, we augment the multitask training and zero-shot evaluation with retrieval from a large-scale task-agnostic unlabeled corpus.
arXiv Detail & Related papers (2022-10-01T04:08:50Z) - Building Multilingual Machine Translation Systems That Serve Arbitrary
X-Y Translations [75.73028056136778]
We show how to practically build MNMT systems that serve arbitrary X-Y translation directions.
We also examine our proposed approach in an extremely large-scale data setting to accommodate practical deployment scenarios.
arXiv Detail & Related papers (2022-06-30T02:18:15Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.