Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand
for Multilingual Instructions?
- URL: http://arxiv.org/abs/2402.13703v1
- Date: Wed, 21 Feb 2024 11:07:07 GMT
- Title: Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand
for Multilingual Instructions?
- Authors: Alexander Arno Weber, Klaudia Thellmann, Jan Ebert, Nicolas
Flores-Herr, Jens Lehmann, Michael Fromm and Mehdi Ali
- Abstract summary: We show that instruction-tuning on parallel instead of monolingual corpora benefits cross-lingual instruction following capabilities by up to 4.6%.
We also conduct a human annotation study to understand the alignment between human-based and GPT-4-based evaluation within multilingual chat scenarios.
- Score: 44.2017377417911
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The adaption of multilingual pre-trained Large Language Models (LLMs) into
eloquent and helpful assistants is essential to facilitate their use across
different language regions. In that spirit, we are the first to conduct an
extensive study of the performance of multilingual models on parallel,
multi-turn instruction-tuning benchmarks across a selection of the most-spoken
Indo-European languages. We systematically examine the effects of language and
instruction dataset size on a mid-sized, multilingual LLM by instruction-tuning
it on parallel instruction-tuning datasets. Our results demonstrate that
instruction-tuning on parallel instead of monolingual corpora benefits
cross-lingual instruction following capabilities by up to 4.6%. Furthermore, we
show that the Superficial Alignment Hypothesis does not hold in general, as the
investigated multilingual 7B parameter model presents a counter-example
requiring large-scale instruction-tuning datasets. Finally, we conduct a human
annotation study to understand the alignment between human-based and
GPT-4-based evaluation within multilingual chat scenarios.
Related papers
- Multilingual Instruction Tuning With Just a Pinch of Multilinguality [31.360147312195068]
We show that many languages transfer some instruction-following capabilities to other languages from even monolingual tuning.
We observe that models tuned on multilingual mixtures exhibit comparable or superior performance in multiple languages.
diversifying the instruction tuning set with even just 2-4 languages significantly improves cross-lingual generalization.
arXiv Detail & Related papers (2024-01-03T17:48:10Z) - PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B.
To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training.
Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Building High-accuracy Multilingual ASR with Gated Language Experts and
Curriculum Training [45.48362355283723]
We propose gated language experts and curriculum training to enhance multilingual transformer transducer models.
Our method incorporates a gating mechanism and LID loss, enabling transformer experts to learn language-specific information.
arXiv Detail & Related papers (2023-03-01T19:20:01Z) - Multilingual Multimodal Learning with Machine Translated Text [27.7207234512674]
We investigate whether machine translating English multimodal data can be an effective proxy for the lack of readily available multilingual data.
We propose two metrics for automatically removing such translations from the resulting datasets.
In experiments on five tasks across 20 languages in the IGLUE benchmark, we show that translated data can provide a useful signal for multilingual multimodal learning.
arXiv Detail & Related papers (2022-10-24T11:41:20Z) - Bootstrapping Multilingual Semantic Parsers using Large Language Models [28.257114724384806]
translate-train paradigm of transferring English datasets across multiple languages remains to be the key ingredient for training task-specific multilingual models.
We consider the task of multilingual semantic parsing and demonstrate the effectiveness and flexibility offered by large language models (LLMs) for translating English datasets into several languages via few-shot prompting.
arXiv Detail & Related papers (2022-10-13T19:34:14Z) - On Efficiently Acquiring Annotations for Multilingual Models [12.304046317362792]
We show that the strategy of joint learning across multiple languages using a single model performs substantially better than the aforementioned alternatives.
We show that this simple approach enables the model to be data efficient by allowing it to arbitrate its annotation budget to query languages it is less certain on.
arXiv Detail & Related papers (2022-04-03T07:42:13Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z) - XPersona: Evaluating Multilingual Personalized Chatbot [76.00426517401894]
We propose a multi-lingual extension of Persona-Chat, namely XPersona.
Our dataset includes persona conversations in six different languages other than English for building and evaluating multilingual personalized agents.
arXiv Detail & Related papers (2020-03-17T07:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.