BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting
- URL: http://arxiv.org/abs/2212.09535v3
- Date: Sat, 27 May 2023 05:48:38 GMT
- Title: BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting
- Authors: Zheng-Xin Yong, Hailey Schoelkopf, Niklas Muennighoff, Alham Fikri
Aji, David Ifeoluwa Adelani, Khalid Almubarak, M Saiful Bari, Lintang
Sutawika, Jungo Kasai, Ahmed Baruwa, Genta Indra Winata, Stella Biderman,
Edward Raff, Dragomir Radev and Vassilina Nikoulina
- Abstract summary: The BLOOM model is a large publicly available multilingual language model, but its pretraining was limited to 46 languages.
We apply existing language adaptation strategies to BLOOM and benchmark its zero-shot prompting performance on eight new languages.
We conclude that with sufficient training data language adaptation can generalize well to diverse languages.
- Score: 50.24676567971536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The BLOOM model is a large publicly available multilingual language model,
but its pretraining was limited to 46 languages. To extend the benefits of
BLOOM to other languages without incurring prohibitively large costs, it is
desirable to adapt BLOOM to new languages not seen during pretraining. In this
work, we apply existing language adaptation strategies to BLOOM and benchmark
its zero-shot prompting performance on eight new languages in a
resource-constrained setting. We find language adaptation to be effective at
improving zero-shot performance in new languages. Surprisingly, we find that
adapter-based finetuning is more effective than continued pretraining for large
models. In addition, we discover that prompting performance is not
significantly affected by language specifics, such as the writing system. It is
primarily determined by the size of the language adaptation data. We also add
new languages to BLOOMZ, which is a multitask finetuned version of BLOOM
capable of following task instructions zero-shot. We find including a new
language in the multitask fine-tuning mixture to be the most effective method
to teach BLOOMZ a new language. We conclude that with sufficient training data
language adaptation can generalize well to diverse languages. Our code is
available at https://github.com/bigscience-workshop/multilingual-modeling.
Related papers
- MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing [78.62611800987817]
Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data.
We propose a method called MoE-LPR (Mixture-of-Experts with Language Priors) to enhance the multilingual capability.
arXiv Detail & Related papers (2024-08-21T07:43:49Z) - LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language [2.9914612342004503]
This study explores an alternative solution by adapting large language models, primarily trained on English, to low-resource languages.
We assess various strategies, including continual training, instruction fine-tuning, task-specific fine-tuning, and vocabulary extension.
The results show that continual training improves language comprehension, as reflected in perplexity scores, and task-specific tuning generally enhances performance of downstream tasks.
arXiv Detail & Related papers (2024-05-13T13:41:59Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Investigating the Translation Performance of a Large Multilingual
Language Model: the Case of BLOOM [8.858671209228536]
We focus on BLOOM's multilingual ability by evaluating its machine translation performance across several datasets.
We study several aspects including prompt design, model sizes, cross-lingual transfer and the use of discursive context.
arXiv Detail & Related papers (2023-03-03T13:23:42Z) - Generalizing Multimodal Pre-training into Multilingual via Language
Acquisition [54.69707237195554]
English-based Vision-Language Pre-training has achieved great success in various downstream tasks.
Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training.
We propose a textbfMultitextbfLingual textbfAcquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual.
arXiv Detail & Related papers (2022-05-29T08:53:22Z) - Continual Learning in Multilingual NMT via Language-Specific Embeddings [92.91823064720232]
It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data.
Because the parameters of the original model are not modified, its performance on the initial languages does not degrade.
arXiv Detail & Related papers (2021-10-20T10:38:57Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.