TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten
Arabic Varieties
- URL: http://arxiv.org/abs/2308.03051v2
- Date: Mon, 23 Oct 2023 23:26:55 GMT
- Title: TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten
Arabic Varieties
- Authors: Karima Kadaoui, Samar M. Magdy, Abdul Waheed, Md Tawkat Islam
Khondaker, Ahmed Oumar El-Shangiti, El Moatez Billah Nagoudi, Muhammad
Abdul-Mageed
- Abstract summary: We assess Bard and ChatGPT regarding their machine translation proficiencies across ten varieties of Arabic.
Our evaluation covers diverse Arabic varieties such as Classical Arabic (CA), Modern Standard Arabic (MSA), and several country-level dialectal variants.
On CA and MSA, instruction-tuned LLMs, however, trail behind commercial systems such as Google Translate.
- Score: 18.73290429469502
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the purported multilingual proficiency of instruction-finetuned large
language models (LLMs) such as ChatGPT and Bard, the linguistic inclusivity of
these models remains insufficiently explored. Considering this constraint, we
present a thorough assessment of Bard and ChatGPT (encompassing both GPT-3.5
and GPT-4) regarding their machine translation proficiencies across ten
varieties of Arabic. Our evaluation covers diverse Arabic varieties such as
Classical Arabic (CA), Modern Standard Arabic (MSA), and several country-level
dialectal variants. Our analysis indicates that LLMs may encounter challenges
with dialects for which minimal public datasets exist, but on average are
better translators of dialects than existing commercial systems. On CA and MSA,
instruction-tuned LLMs, however, trail behind commercial systems such as Google
Translate. Finally, we undertake a human-centric study to scrutinize the
efficacy of the relatively recent model, Bard, in following human instructions
during translation tasks. Our analysis reveals a circumscribed capability of
Bard in aligning with human instructions in translation contexts. Collectively,
our findings underscore that prevailing LLMs remain far from inclusive, with
only limited ability to cater for the linguistic and cultural intricacies of
diverse communities.
Related papers
- Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation [33.08089616645845]
The advent of Large Language Models (LLMs) has significantly reshaped the landscape of machine translation (MT)
We analyze techniques such as few-shot prompting, cross-lingual transfer, and parameter-efficient fine-tuning that enable effective adaptation to under-resourced settings.
We discuss persistent challenges such as hallucinations, evaluation inconsistencies, and inherited biases while also evaluating emerging LLM-driven metrics for translation quality.
arXiv Detail & Related papers (2025-04-02T17:26:40Z) - Jawaher: A Multidialectal Dataset of Arabic Proverbs for LLM Benchmarking [12.078532717928185]
Large language models (LLMs) continue to exhibit biases toward Western, Anglo-centric, or American cultures.
We introduce Jawaher, a benchmark designed to assess LLMs' capacity to comprehend and interpret Arabic proverbs.
We find that while LLMs can generate idiomatically accurate translations, they struggle with producing culturally nuanced and contextually relevant explanations.
arXiv Detail & Related papers (2025-02-28T22:28:00Z) - Controlled Evaluation of Syntactic Knowledge in Multilingual Language Models [16.414150004715367]
This study develops targeted syntactic evaluation tests for three low-resource languages.
We use them to evaluate five families of open-access multilingual Transformer LMs.
We find that some syntactic tasks prove relatively easy for LMs while others are challenging.
arXiv Detail & Related papers (2024-11-12T01:26:41Z) - One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [68.33068005789116]
We present the first study aimed at objectively assessing the fairness and robustness of Large Language Models (LLMs) in handling dialects in canonical reasoning tasks.
We hire AAVE speakers, including experts with computer science backgrounds, to rewrite seven popular benchmarks, such as HumanEval and GSM8K.
Our findings reveal that textbfalmost all of these widely used models show significant brittleness and unfairness to queries in AAVE.
arXiv Detail & Related papers (2024-10-14T18:44:23Z) - AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs [22.121471902726892]
We present AraDiCE, a benchmark for Arabic Dialect and Cultural Evaluation.
First-ever fine-grained benchmark designed to evaluate cultural awareness across the Gulf, Egypt, and Levant regions.
We will release the dialectal translation models and benchmarks curated in this study.
arXiv Detail & Related papers (2024-09-17T17:59:25Z) - Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance [6.907734681124986]
This paper strategically identifies the need for linguistic equity by examining several knowledge editing techniques in multilingual contexts.
We evaluate the performance of models such as Mistral, TowerInstruct, OpenHathi, Tamil-Llama, and Kan-Llama across languages including English, German, French, Italian, Spanish, Hindi, Tamil, and Kannada.
arXiv Detail & Related papers (2024-06-17T01:54:27Z) - (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [52.18246881218829]
We introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents.
To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP)
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - Zero-Shot Cross-Lingual Reranking with Large Language Models for
Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages.
Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba)
We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z) - Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction
Following: A Case Study of Arabic [1.0878040851638]
We employ GPT-4 as a uniform evaluator for both English and Arabic queries to assess and compare the performance of the LLMs on various open-ended tasks.
We find that fine-tuned base models using multilingual and multi-turn datasets could be competitive to models trained from scratch on multilingual data.
arXiv Detail & Related papers (2023-10-23T11:40:04Z) - ChatGPT MT: Competitive for High- (but not Low-) Resource Languages [62.178282377729566]
Large language models (LLMs) implicitly learn to perform a range of language tasks, including machine translation (MT)
We present the first experimental evidence for an expansive set of 204 languages, along with MT cost analysis.
Our analysis reveals that a language's resource level is the most important feature in determining ChatGPT's relative ability to translate it.
arXiv Detail & Related papers (2023-09-14T04:36:00Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions [68.01449013641532]
Large-scale Pretrained Language Models (LLMs) have shown strong abilities in multilingual translations.
We present a detailed analysis by finetuning a multilingual pretrained language model, XGLM-7B, to perform multilingual translation.
arXiv Detail & Related papers (2023-05-24T12:00:24Z) - Large language models effectively leverage document-level context for
literary translation, but critical errors persist [32.54546652197316]
Large language models (LLMs) are competitive with the state of the art on a wide range of sentence-level translation datasets.
We show through a rigorous human evaluation that asking the Gpt-3.5 (text-davinci-003) LLM to translate an entire literary paragraph results in higher-quality translations.
arXiv Detail & Related papers (2023-04-06T17:27:45Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.