SMaLL-100: Introducing Shallow Multilingual Machine Translation Model
for Low-Resource Languages
- URL: http://arxiv.org/abs/2210.11621v1
- Date: Thu, 20 Oct 2022 22:32:29 GMT
- Title: SMaLL-100: Introducing Shallow Multilingual Machine Translation Model
for Low-Resource Languages
- Authors: Alireza Mohammadshahi, Vassilina Nikoulina, Alexandre Berard, Caroline
Brun, James Henderson, Laurent Besacier
- Abstract summary: We introduce SMaLL-100, a distilled version of the M2M-100 (12B) machine translation model covering 100 languages.
We train SMaLL-100 with uniform sampling across all language pairs and therefore focus on preserving the performance of low-resource languages.
Our model achieves comparable results to M2M-100 (1.2B), while being 3.6x smaller and 4.3x faster at inference.
- Score: 102.50127671423752
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, multilingual machine translation models have achieved
promising performance on low-resource language pairs by sharing information
between similar languages, thus enabling zero-shot translation. To overcome the
"curse of multilinguality", these models often opt for scaling up the number of
parameters, which makes their use in resource-constrained environments
challenging. We introduce SMaLL-100, a distilled version of the M2M-100 (12B)
model, a massively multilingual machine translation model covering 100
languages. We train SMaLL-100 with uniform sampling across all language pairs
and therefore focus on preserving the performance of low-resource languages. We
evaluate SMaLL-100 on different low-resource benchmarks: FLORES-101, Tatoeba,
and TICO-19 and demonstrate that it outperforms previous massively multilingual
models of comparable sizes (200-600M) while improving inference latency and
memory usage. Additionally, our model achieves comparable results to M2M-100
(1.2B), while being 3.6x smaller and 4.3x faster at inference. Code and
pre-trained models: https://github.com/alirezamshi/small100
Related papers
- Paramanu: A Family of Novel Efficient Generative Foundation Language Models for Indian Languages [3.9018931027384056]
We present "Paramanu", a family of novel language models (LM) for Indian languages.
It covers 10 languages (Assamese, Bangla, Hindi, Konkani, Maithili, Marathi, Odia, Sanskrit, Tamil, Telugu) across 5 scripts.
The models are pretrained on a single GPU with context size of 1024 and vary in size from 13.29 million (M) to 367.5 M parameters.
arXiv Detail & Related papers (2024-01-31T17:58:10Z) - The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants [80.4837840962273]
We present Belebele, a dataset spanning 122 language variants.
This dataset enables the evaluation of text models in high-, medium-, and low-resource languages.
arXiv Detail & Related papers (2023-08-31T17:43:08Z) - Investigating the Translation Performance of a Large Multilingual
Language Model: the Case of BLOOM [8.858671209228536]
We focus on BLOOM's multilingual ability by evaluating its machine translation performance across several datasets.
We study several aspects including prompt design, model sizes, cross-lingual transfer and the use of discursive context.
arXiv Detail & Related papers (2023-03-03T13:23:42Z) - MiLMo:Minority Multilingual Pre-trained Language Model [1.6409017540235764]
This paper constructs a multilingual pre-trained model named MiLMo that performs better on minority language tasks.
By comparing the word2vec model and the pre-trained model in the text classification task, this paper provides an optimal scheme for the downstream task research of minority languages.
arXiv Detail & Related papers (2022-12-04T09:28:17Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.