Crosslingual Generalization through Multitask Finetuning
- URL: http://arxiv.org/abs/2211.01786v2
- Date: Mon, 29 May 2023 16:40:37 GMT
- Title: Crosslingual Generalization through Multitask Finetuning
- Authors: Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts,
Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng-Xin Yong,
Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid
Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, Colin
Raffel
- Abstract summary: Multitask prompted finetuning (MTF) has been shown to help large language models generalize to new tasks in a zero-shot setting.
We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0.
We find finetuning large multilingual language models on English tasks with English prompts allows for task generalization to non-English languages.
- Score: 80.8822603322471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multitask prompted finetuning (MTF) has been shown to help large language
models generalize to new tasks in a zero-shot setting, but so far explorations
of MTF have focused on English data and models. We apply MTF to the pretrained
multilingual BLOOM and mT5 model families to produce finetuned variants called
BLOOMZ and mT0. We find finetuning large multilingual language models on
English tasks with English prompts allows for task generalization to
non-English languages that appear only in the pretraining corpus. Finetuning on
multilingual tasks with English prompts further improves performance on English
and non-English tasks leading to various state-of-the-art zero-shot results. We
also investigate finetuning on multilingual tasks with prompts that have been
machine-translated from English to match the language of each dataset. We find
training on these machine-translated prompts leads to better performance on
human-written prompts in the respective languages. Surprisingly, we find models
are capable of zero-shot generalization to tasks in languages they have never
intentionally seen. We conjecture that the models are learning higher-level
capabilities that are both task- and language-agnostic. In addition, we
introduce xP3, a composite of supervised datasets in 46 languages with English
and machine-translated prompts. Our code, datasets and models are freely
available at https://github.com/bigscience-workshop/xmtf.
Related papers
- Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed? [40.13166574854085]
We investigate the minimal amount of multilinguality required to elicit cross-lingual generalisation in English-centric large language models.
We find that multilingual instruction tuning with as few as two to three languages is both necessary and sufficient to elicit effective cross-lingual generalisation.
arXiv Detail & Related papers (2023-12-20T00:49:52Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Bootstrapping Multilingual Semantic Parsers using Large Language Models [28.257114724384806]
translate-train paradigm of transferring English datasets across multiple languages remains to be the key ingredient for training task-specific multilingual models.
We consider the task of multilingual semantic parsing and demonstrate the effectiveness and flexibility offered by large language models (LLMs) for translating English datasets into several languages via few-shot prompting.
arXiv Detail & Related papers (2022-10-13T19:34:14Z) - Multitask Prompted Training Enables Zero-Shot Task Generalization [70.12770442071657]
We develop a system for mapping general natural language tasks into a human-readable prompted form.
We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
arXiv Detail & Related papers (2021-10-15T17:08:57Z) - From Masked Language Modeling to Translation: Non-English Auxiliary
Tasks Improve Zero-shot Spoken Language Understanding [24.149299722716155]
We introduce xSID, a new benchmark for cross-lingual Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect.
We propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer.
Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification.
arXiv Detail & Related papers (2021-05-15T23:51:11Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.