Related papers: Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?

Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?

URL: http://arxiv.org/abs/2312.12683v1
Date: Wed, 20 Dec 2023 00:49:52 GMT
Title: Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?
Authors: Tannon Kew, Florian Schottmann, Rico Sennrich
Abstract summary: Cross-lingual transfer is important for achieving decent performance in non-English settings. We find that, compared to English-only finetuning, multilingual instruction tuning with as few as three languages significantly improves a model's cross-lingual transfer abilities.
Score: 45.10397051167781
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The vast majority of today's large language models are English-centric, having been pretrained predominantly on English text. Yet, in order to meet user expectations, models need to be able to respond appropriately in multiple languages once deployed in downstream applications. Given limited exposure to other languages during pretraining, cross-lingual transfer is important for achieving decent performance in non-English settings. In this work, we investigate just how much multilinguality is required during finetuning to elicit strong cross-lingual generalisation across a range of tasks and target languages. We find that, compared to English-only finetuning, multilingual instruction tuning with as few as three languages significantly improves a model's cross-lingual transfer abilities on generative tasks that assume input/output language agreement, while being of less importance for highly structured tasks. Our code and data is available at https://github.com/ZurichNLP/multilingual-instruction-tuning.

Related papers

CoCo-CoLa: Evaluating Language Adherence in Multilingual LLMs [1.2057938662974816]
Large Language Models (LLMs) develop cross-lingual abilities despite being trained on limited parallel data. We introduce CoCo-CoLa, a novel metric to evaluate language adherence in multilingual LLMs.
arXiv Detail & Related papers (2025-02-18T03:03:53Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs) It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models [79.46179534911019]
Large language models (LLMs) have demonstrated multilingual capabilities; yet, they are mostly English-centric due to imbalanced training corpora. This work extends the evaluation from NLP tasks to real user queries. For culture-related tasks that need deep language understanding, prompting in the native language tends to be more promising.
arXiv Detail & Related papers (2024-03-15T12:47:39Z)
How Vocabulary Sharing Facilitates Multilingualism in LLaMA? [19.136382859468693]
Large Language Models (LLMs) often show strong performance on English tasks, while exhibiting limitations on other languages. This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective.
arXiv Detail & Related papers (2023-11-15T16:13:14Z)
PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z)
Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks. We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset. To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z)
Multilingual Language Model Adaptive Fine-Tuning: A Study on African Languages [19.067718464786463]
We perform multilingual adaptive fine-tuning (MAFT) on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent. To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT. Our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space.
arXiv Detail & Related papers (2022-04-13T16:13:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.