mGPT: Few-Shot Learners Go Multilingual
- URL: http://arxiv.org/abs/2204.07580v2
- Date: Thu, 12 Oct 2023 17:07:29 GMT
- Title: mGPT: Few-Shot Learners Go Multilingual
- Authors: Oleh Shliazhko, Alena Fenogenova, Maria Tikhonova, Vladislav
Mikhailov, Anastasia Kozlova, Tatiana Shavrina
- Abstract summary: This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages.
We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism.
The resulting models show performance on par with the recently released XGLM models by Facebook.
- Score: 1.4354798873010843
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies report that autoregressive language models can successfully
solve many NLP tasks via zero- and few-shot learning paradigms, which opens up
new possibilities for using the pre-trained language models. This paper
introduces two autoregressive GPT-like models with 1.3 billion and 13 billion
parameters trained on 60 languages from 25 language families using Wikipedia
and Colossal Clean Crawled Corpus. We reproduce the GPT-3 architecture using
GPT-2 sources and the sparse attention mechanism; Deepspeed and Megatron
frameworks allow us to parallelize the training and inference steps
effectively. The resulting models show performance on par with the recently
released XGLM models by Facebook, covering more languages and enhancing NLP
possibilities for low resource languages of CIS countries and Russian small
nations. We detail the motivation for the choices of the architecture design,
thoroughly describe the data preparation pipeline, and train five small
versions of the model to choose the most optimal multilingual tokenization
strategy. We measure the model perplexity in all covered languages and evaluate
it on the wide spectre of multilingual tasks, including classification,
generative, sequence labeling and knowledge probing. The models were evaluated
with the zero-shot and few-shot methods. Furthermore, we compared the
classification tasks with the state-of-the-art multilingual model XGLM. source
code and the mGPT XL model are publicly released.
Related papers
- From N-grams to Pre-trained Multilingual Models For Language Identification [0.35760345713831954]
We investigate the use of N-gram models and Large Pre-trained Multilingual models for Language Identification (LID) across 11 South African languages.
For N-gram models, this study shows that effective data size selection remains crucial for establishing effective frequency distributions of the target languages.
We show that Serengeti is a superior model across models: N-grams to Transformers on average.
arXiv Detail & Related papers (2024-10-11T11:35:57Z) - Generative Model for Less-Resourced Language with 1 billion parameters [0.0]
GaMS 1B - Generative Model for Slovene with 1 billion parameters was created by continuing pretraining of the existing English OPT model.
We develop a new tokenizer adapted to Slovene, Croatian, and English languages.
We evaluate our models on several classification datasets from the Slovene suite of benchmarks and generative sentence simplification task SENTA.
arXiv Detail & Related papers (2024-10-09T13:59:34Z) - PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B.
To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training.
Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z) - Extrapolating Multilingual Understanding Models as Multilingual
Generators [82.1355802012414]
This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model.
We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
arXiv Detail & Related papers (2023-05-22T15:33:21Z) - Bidirectional Language Models Are Also Few-shot Learners [54.37445173284831]
We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models.
We show SAP is effective on question answering and summarization.
For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models.
arXiv Detail & Related papers (2022-09-29T01:35:57Z) - Cross-Lingual Text Classification with Multilingual Distillation and
Zero-Shot-Aware Training [21.934439663979663]
Multi-branch multilingual language model (MBLM) built on Multilingual pre-trained language models (MPLMs)
Method based on transferring knowledge from high-performance monolingual models with a teacher-student framework.
Results on two cross-lingual classification tasks show that, with only the task's supervised data used, our method improves both the supervised and zero-shot performance of MPLMs.
arXiv Detail & Related papers (2022-02-28T09:51:32Z) - Cedille: A large autoregressive French language model [0.21756081703276003]
We introduce Cedille, a large open source auto-regressive language model, specifically trained for the French language.
Our results show that Cedille outperforms existing French language models and is competitive with GPT-3 on a range of French zero-shot benchmarks.
arXiv Detail & Related papers (2022-02-07T17:40:43Z) - Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages.
Our largest model sets new state of the art in few-shot learning in more than 20 representative languages.
We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Structure-Level Knowledge Distillation For Multilingual Sequence
Labeling [73.40368222437912]
We propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models to the unified multilingual model (student)
Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.
arXiv Detail & Related papers (2020-04-08T07:14:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.