Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is
Needed?
- URL: http://arxiv.org/abs/2312.12683v1
- Date: Wed, 20 Dec 2023 00:49:52 GMT
- Title: Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is
Needed?
- Authors: Tannon Kew, Florian Schottmann, Rico Sennrich
- Abstract summary: Cross-lingual transfer is important for achieving decent performance in non-English settings.
We find that, compared to English-only finetuning, multilingual instruction tuning with as few as three languages significantly improves a model's cross-lingual transfer abilities.
- Score: 45.10397051167781
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The vast majority of today's large language models are English-centric,
having been pretrained predominantly on English text. Yet, in order to meet
user expectations, models need to be able to respond appropriately in multiple
languages once deployed in downstream applications. Given limited exposure to
other languages during pretraining, cross-lingual transfer is important for
achieving decent performance in non-English settings. In this work, we
investigate just how much multilinguality is required during finetuning to
elicit strong cross-lingual generalisation across a range of tasks and target
languages. We find that, compared to English-only finetuning, multilingual
instruction tuning with as few as three languages significantly improves a
model's cross-lingual transfer abilities on generative tasks that assume
input/output language agreement, while being of less importance for highly
structured tasks. Our code and data is available at
https://github.com/ZurichNLP/multilingual-instruction-tuning.
Related papers
- Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language [2.9914612342004503]
This study explores an alternative solution by adapting large language models, primarily trained on English, to low-resource languages.
We assess various strategies, including continual training, instruction fine-tuning, task-specific fine-tuning, and vocabulary extension.
The results show that continual training improves language comprehension, as reflected in perplexity scores, and task-specific tuning generally enhances performance of downstream tasks.
arXiv Detail & Related papers (2024-05-13T13:41:59Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Crosslingual Generalization through Multitask Finetuning [80.8822603322471]
Multitask prompted finetuning (MTF) has been shown to help large language models generalize to new tasks in a zero-shot setting.
We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0.
We find finetuning large multilingual language models on English tasks with English prompts allows for task generalization to non-English languages.
arXiv Detail & Related papers (2022-11-03T13:19:32Z) - Multilingual Language Model Adaptive Fine-Tuning: A Study on African
Languages [19.067718464786463]
We perform multilingual adaptive fine-tuning (MAFT) on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent.
To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT.
Our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space.
arXiv Detail & Related papers (2022-04-13T16:13:49Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.