Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction
Following: A Case Study of Arabic
- URL: http://arxiv.org/abs/2310.14819v1
- Date: Mon, 23 Oct 2023 11:40:04 GMT
- Title: Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction
Following: A Case Study of Arabic
- Authors: Sabri Boughorbel, Majd Hawasly
- Abstract summary: We employ GPT-4 as a uniform evaluator for both English and Arabic queries to assess and compare the performance of the LLMs on various open-ended tasks.
We find that fine-tuned base models using multilingual and multi-turn datasets could be competitive to models trained from scratch on multilingual data.
- Score: 1.0878040851638
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While significant progress has been made in benchmarking Large Language
Models (LLMs) across various tasks, there is a lack of comprehensive evaluation
of their abilities in responding to multi-turn instructions in less-commonly
tested languages like Arabic. Our paper offers a detailed examination of the
proficiency of open LLMs in such scenarios in Arabic. Utilizing a customized
Arabic translation of the MT-Bench benchmark suite, we employ GPT-4 as a
uniform evaluator for both English and Arabic queries to assess and compare the
performance of the LLMs on various open-ended tasks. Our findings reveal
variations in model responses on different task categories, e.g., logic vs.
literacy, when instructed in English or Arabic. We find that fine-tuned base
models using multilingual and multi-turn datasets could be competitive to
models trained from scratch on multilingual data. Finally, we hypothesize that
an ensemble of small, open LLMs could perform competitively to proprietary LLMs
on the benchmark.
Related papers
- ArabLegalEval: A Multitask Benchmark for Assessing Arabic Legal Knowledge in Large Language Models [0.0]
ArabLegalEval is a benchmark dataset for assessing the Arabic legal knowledge of Large Language Models (LLMs)
Inspired by the MMLU and LegalBench datasets, ArabLegalEval consists of multiple tasks sourced from Saudi legal documents and synthesized questions.
We aim to analyze the capabilities required to solve legal problems in Arabic and benchmark the performance of state-of-the-art LLMs.
arXiv Detail & Related papers (2024-08-15T07:09:51Z) - Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities.
In this work, we investigate the spontaneous multilingual alignment improvement of LLMs.
We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z) - Tower: An Open Multilingual Large Language Model for Translation-Related
Tasks [27.237316809769975]
We propose a recipe for tailoring large language models (LLMs) to multiple tasks present in translation.
Our final model surpasses open alternatives on several tasks relevant to translation and is competitive with general-purpose closed LLMs.
arXiv Detail & Related papers (2024-02-27T18:09:36Z) - OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large
Language Models [59.54423478596468]
We introduce OMGEval, the first Open-source Multilingual Generative test set that can assess the capability of LLMs in different languages.
For each language, OMGEval provides 804 open-ended questions, covering a wide range of important capabilities of LLMs.
Specifically, the current version of OMGEval includes 5 languages (i.e., Zh, Ru, Fr, Es, Ar)
arXiv Detail & Related papers (2024-02-21T04:42:41Z) - Zero-Shot Cross-Lingual Reranking with Large Language Models for
Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages.
Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba)
We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z) - The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants [80.4837840962273]
We present Belebele, a dataset spanning 122 language variants.
This dataset enables the evaluation of text models in high-, medium-, and low-resource languages.
arXiv Detail & Related papers (2023-08-31T17:43:08Z) - Extrapolating Large Language Models to Non-English by Aligning Languages [109.09051737966178]
Existing large language models show disparate capability across different languages.
In this paper, we empower pre-trained LLMs on non-English languages by building semantic alignment across languages.
arXiv Detail & Related papers (2023-08-09T13:32:06Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - Don't Trust ChatGPT when Your Question is not in English: A Study of
Multilingual Abilities and Types of LLMs [16.770697902481107]
Large Language Models (LLMs) have demonstrated exceptional natural language understanding abilities.
We propose a systematic way of qualifying the performance disparities of LLMs under multilingual settings.
The results show that GPT exhibits highly translating-like behaviour in multilingual settings.
arXiv Detail & Related papers (2023-05-24T02:05:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.