Okapi: Instruction-tuned Large Language Models in Multiple Languages
with Reinforcement Learning from Human Feedback
- URL: http://arxiv.org/abs/2307.16039v2
- Date: Wed, 2 Aug 2023 00:39:25 GMT
- Title: Okapi: Instruction-tuned Large Language Models in Multiple Languages
with Reinforcement Learning from Human Feedback
- Authors: Viet Dac Lai, Chien Van Nguyen, Nghia Trung Ngo, Thuat Nguyen, Franck
Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen
- Abstract summary: We present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages.
Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research.
- Score: 61.83548032416181
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A key technology for the development of large language models (LLMs) involves
instruction tuning that helps align the models' responses with human
expectations to realize impressive learning abilities. Two major approaches for
instruction tuning characterize supervised fine-tuning (SFT) and reinforcement
learning from human feedback (RLHF), which are currently applied to produce the
best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for
research and development efforts, various instruction-tuned open-source LLMs
have also been introduced recently, e.g., Alpaca, Vicuna, to name a few.
However, existing open-source LLMs have only been instruction-tuned for English
and a few popular languages, thus hindering their impacts and accessibility to
many other languages in the world. Among a few very recent work to explore
instruction tuning for LLMs in multiple languages, SFT has been used as the
only approach to instruction-tune LLMs for multiple languages. This has left a
significant gap for fine-tuned LLMs based on RLHF in diverse languages and
raised important questions on how RLHF can boost the performance of
multilingual instruction tuning. To overcome this issue, we present Okapi, the
first system with instruction-tuned LLMs based on RLHF for multiple languages.
Okapi introduces instruction and response-ranked data in 26 diverse languages
to facilitate the experiments and development of future multilingual LLM
research. We also present benchmark datasets to enable the evaluation of
generative LLMs in multiple languages. Our experiments demonstrate the
advantages of RLHF for multilingual instruction over SFT for different base
models and datasets. Our framework and resources are released at
https://github.com/nlp-uoregon/Okapi.
Related papers
- MindMerger: Efficient Boosting LLM Reasoning in non-English Languages [26.334092384176518]
Reasoning capabilities are crucial for Large Language Models (LLMs)
We propose MindMerger, which merges LLMs with the external language understanding capabilities from multilingual models.
MindMerger consistently outperforms all baselines, especially in low-resource languages.
arXiv Detail & Related papers (2024-05-27T17:41:54Z) - Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities.
In this work, we investigate the spontaneous multilingual alignment improvement of LLMs.
We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z) - Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora.
We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs.
Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z) - UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised
Fine-tuning Dataset [69.33424532827608]
Open-source large language models (LLMs) have gained significant strength across diverse fields.
In this work, we construct an open-source multilingual supervised fine-tuning dataset.
The resulting UltraLink dataset comprises approximately 1 million samples across five languages.
arXiv Detail & Related papers (2024-02-07T05:05:53Z) - How Vocabulary Sharing Facilitates Multilingualism in LLaMA? [19.136382859468693]
Large Language Models (LLMs) often show strong performance on English tasks, while exhibiting limitations on other languages.
This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective.
arXiv Detail & Related papers (2023-11-15T16:13:14Z) - PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning [46.153828074152436]
We propose a pivot language guided generation approach to enhance instruction tuning in lower-resource languages.
It trains the model to first process instructions in the pivot language, and then produce responses in the target language.
Our approach demonstrates a significant improvement in the instruction-following abilities of LLMs by 29% on average.
arXiv Detail & Related papers (2023-11-15T05:28:07Z) - CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
Language Models in 167 Languages [86.90220551111096]
Training datasets for large language models (LLMs) are often not fully disclosed.
We present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages.
arXiv Detail & Related papers (2023-09-17T23:49:10Z) - Don't Trust ChatGPT when Your Question is not in English: A Study of
Multilingual Abilities and Types of LLMs [16.770697902481107]
Large Language Models (LLMs) have demonstrated exceptional natural language understanding abilities.
We propose a systematic way of qualifying the performance disparities of LLMs under multilingual settings.
The results show that GPT exhibits highly translating-like behaviour in multilingual settings.
arXiv Detail & Related papers (2023-05-24T02:05:03Z) - InstructAlign: High-and-Low Resource Language Alignment via Continual
Crosslingual Instruction Tuning [66.31509106146605]
Large language models (LLMs) that are tuned with instructions have demonstrated remarkable capabilities in various tasks and languages.
However, their ability to generalize to underrepresented languages is limited due to the scarcity of available data.
We propose InstructAlign which uses continual crosslingual instruction tuning to enable LLMs to align new unseen languages with previously learned high-resource languages.
arXiv Detail & Related papers (2023-05-23T02:51:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.