Related papers: CamelEval: Advancing Culturally Aligned Arabic Language Models and Benchmarks

CamelEval: Advancing Culturally Aligned Arabic Language Models and Benchmarks

URL: http://arxiv.org/abs/2409.12623v2
Date: Tue, 24 Sep 2024 08:49:21 GMT
Title: CamelEval: Advancing Culturally Aligned Arabic Language Models and Benchmarks
Authors: Zhaozhi Qian, Faroq Altam, Muhammad Alqurishi, Riad Souissi,
Abstract summary: This paper introduces Juhaina, a Arabic-English bilingual LLM specifically designed to align with the values and preferences of Arabic speakers. Our model contains 9.24 billion parameters and is trained on a context window of up to 8,192 tokens.
Score: 19.403924294587043
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are the cornerstones of modern artificial intelligence systems. This paper introduces Juhaina, a Arabic-English bilingual LLM specifically designed to align with the values and preferences of Arabic speakers. Juhaina inherently supports advanced functionalities such as instruction following, open-ended question answering, information provisioning, and text processing. Our model contains 9.24 billion parameters and is trained on a context window of up to 8,192 tokens. This paper details the creation process of Juhaina and provides an extensive empirical evaluation. Furthermore, we identify the limitations of widely-adopted Open Arabic LLM Leaderboard (OALL) and propose a new evaluation benchmark, CamelEval. Our findings demonstrate that Juhaina surpasses existing LLMs of comparable sizes, such as the Llama and Gemma families, in generating helpful responses in Arabic, providing factually accurate information about the region, and understanding nuanced cultural aspects. We aspire for Juhaina to democratize cutting-edge AI technologies, serving over 400 million Arabic speakers by offering LLMs that not only communicate in their language but also comprehend their culture. We publicly release all models on Huggingface \url{https://huggingface.co/elmrc}.

Related papers

Aladdin-FTI @ AMIYA Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation [1.817669530501506]
Arabic dialects have long been under-represented in Natural Language Processing (NLP) research.<n>Recent advances in the field, such as Large Language Models (LLMs), offer promising avenues to address this gap.<n>This paper presents Aladdin-FTI, our submission to the AMIYA shared task.
arXiv Detail & Related papers (2026-02-18T09:15:20Z)
DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models [54.10223256792762]
We present DialectalArabicMMLU, a new benchmark for evaluating the performance of large language models (LLMs) across Arabic dialects.<n>We extend the MMLU-Redux framework through manual translation and adaptation of 3K multiple-choice question-answer pairs into five major dialects.
arXiv Detail & Related papers (2025-10-31T15:17:06Z)
Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale [51.41777906371754]
We present Hala, a family of Arabic-centric instruction and translation models built with our translate-and-tune pipeline.<n>A lightweight language model LFM2-1.2B is then fine-tuned on this data and used to translate high-quality English instruction sets into Arabic.<n>We train Hala models at 350M, 700M, 1.2B, and 9B parameters, and apply slerp merging to balance Arabic specialization with base-model strengths.
arXiv Detail & Related papers (2025-09-17T14:19:28Z)
Absher: A Benchmark for Evaluating Large Language Models Understanding of Saudi Dialects [0.1499944454332829]
textttAbsher comprises over 18,000 multiple-choice questions spanning six distinct categories.<n>These questions are derived from a curated dataset of dialectal words, phrases, and proverbs sourced from various regions of Saudi Arabia.<n>We evaluate several state-of-the-art LLMs, including multilingual and Arabic-specific models.
arXiv Detail & Related papers (2025-07-14T12:33:07Z)
NileChat: Towards Linguistically Diverse and Culturally Aware LLMs for Local Communities [28.926075586175173]
Enhancing the linguistic capabilities of Large Language Models (LLMs) to include low-resource languages is a critical research area.<n>Current research directions rely on synthetic data generated by translating English corpora.<n>This work proposes a methodology to create both synthetic and retrieval-based pre-training data tailored to a specific community.
arXiv Detail & Related papers (2025-05-23T21:18:40Z)
Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs [14.874327728051288]
We introduce our dataset, a year-long community-driven project covering all 22 Arab countries. The dataset includes instructions in both Modern Standard Arabic (MSA) and dialectal Arabic (DA), spanning 20 diverse topics. We use our dataset to evaluate the cultural and dialectal capabilities of several frontier LLMs, revealing notable limitations.
arXiv Detail & Related papers (2025-02-28T19:59:13Z)
Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion [55.27025066199226]
This paper addresses the need for democratizing large language models (LLM) in the Arab world. One practical objective for an Arabic LLM is to utilize an Arabic-specific vocabulary for the tokenizer that could speed up decoding. Inspired by the vocabulary learning during Second Language (Arabic) Acquisition for humans, the released AraLLaMA employs progressive vocabulary expansion.
arXiv Detail & Related papers (2024-12-16T19:29:06Z)
AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs [22.121471902726892]
We present AraDiCE, a benchmark for Arabic Dialect and Cultural Evaluation. First-ever fine-grained benchmark designed to evaluate cultural awareness across the Gulf, Egypt, and Levant regions. We will release the dialectal translation models and benchmarks curated in this study.
arXiv Detail & Related papers (2024-09-17T17:59:25Z)
ALLaM: Large Language Models for Arabic and English [9.881560166505452]
We present ALLaM: Arabic Large Language Model, a series of large language models to support the ecosystem of Arabic Language Technologies (ALT) Our autoregressive decoder-only architecture models demonstrate how second-language acquisition via vocabulary expansion and pretraining can steer a model towards a new language (Arabic) without any catastrophic forgetting in the original language (English) We show that extensive alignment with human preferences can significantly enhance the performance of a language model compared to models of a larger scale with lower quality alignment.
arXiv Detail & Related papers (2024-07-22T05:35:17Z)
YuLan: An Open-source Large Language Model [179.59272970659677]
This paper presents the development of YuLan, a series of open-source large language models (LLMs) with $12$ billion parameters. The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts. We devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner.
arXiv Detail & Related papers (2024-06-28T11:52:53Z)
Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks [29.819766942335416]
Multimodal large language models (MLLMs) have proven effective in a wide range of tasks requiring complex reasoning and linguistic comprehension. We introduce a comprehensive family of Arabic MLLMs, dubbed textitPeacock, with strong vision and language capabilities.
arXiv Detail & Related papers (2024-03-01T23:38:02Z)
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic [51.922112625469836]
We present datasetname, the first multi-task language understanding benchmark for the Arabic language. Our data comprises 40 tasks and 14,575 multiple-choice questions in Modern Standard Arabic (MSA) and is carefully constructed by collaborating with native speakers in the region. Our evaluations of 35 models reveal substantial room for improvement, particularly among the best open-source models.
arXiv Detail & Related papers (2024-02-20T09:07:41Z)
YAYI 2: Multilingual Open-Source Large Language Models [53.92832054643197]
We propose YAYI 2, including both base and chat models, with 30 billion parameters. YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline. The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback.
arXiv Detail & Related papers (2023-12-22T17:34:47Z)
Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM [77.17254959695218]
Large Language Models (LLMs) like ChatGPT and Bard have shown impressive conversational abilities and excel in a wide variety of NLP tasks. We propose a light-weight Arabic Mini-ClimateGPT that is built on an open-source LLM and is specifically fine-tuned on a conversational-style instruction tuning Arabic dataset Clima500-Instruct. Our model surpasses the baseline LLM in 88.3% of cases during ChatGPT-based evaluation.
arXiv Detail & Related papers (2023-12-14T22:04:07Z)
AceGPT, Localizing Large Language Models in Arabic [73.39989503874634]
The paper proposes a comprehensive solution that includes pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic. The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities.
arXiv Detail & Related papers (2023-09-21T13:20:13Z)
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models [57.76998376458017]
We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models (LLMs) The models are based on the GPT-3 decoder-only architecture and are pretrained on a mixture of Arabic and English texts. We provide a detailed description of the training, the tuning, the safety alignment, and the evaluation of the models.
arXiv Detail & Related papers (2023-08-30T17:07:17Z)
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models [25.722262209465846]
We show that multilingual and Arabic monolingual LMs exhibit bias towards entities associated with Western culture. We introduce CAMeL, a novel resource of 628 naturally-occurring prompts and 20,368 entities spanning eight types that contrast Arab and Western cultures. Using CAMeL, we examine the cross-cultural performance in Arabic of 16 different LMs on tasks such as story generation, NER, and sentiment analysis.
arXiv Detail & Related papers (2023-05-23T18:27:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.