Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
- URL: http://arxiv.org/abs/2505.21979v3
- Date: Fri, 26 Sep 2025 21:54:35 GMT
- Title: Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
- Authors: Fakhraddin Alwajih, Samar M. Magdy, Abdellah El Mekki, Omer Nacar, Youssef Nafea, Safaa Taher Abdelfadil, Abdulfattah Mohammed Yahya, Hamzah Luqman, Nada Almarwani, Samah Aloufi, Baraah Qawasmen, Houdaifa Atou, Serry Sibaee, Hamzah A. Alsayadi, Walid Al-Dhabyani, Maged S. Al-shaibani, Aya El Aatar, Nour Qandos, Rahaf Alhamouri, Samar Ahmad, Mohammed Anwar Al-Ghrawi, Aminetou Yacoub, Ruwa AbuHweidi, Vatimetou Mohamed Lemin, Reem Abdel-Salam, Ahlam Bashiti, Aisha Alansari, Ahmed Ashraf, Nora Alturayeif, Alcides Alcoba Inciarte, Adel Ammar, Abdelrahim A. Elmadany, Mohamedou Cheikh Tourad, Ismail Berrada, Mustafa Jarrar, Shady Shehata, Muhammad Abdul-Mageed,
- Abstract summary: PEARL is a large-scale Arabic multimodal dataset and benchmark designed for cultural understanding.<n>PEARL comprises over 309K examples spanning ten culturally significant domains covering all Arab countries.
- Score: 28.016981736730617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models' cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available.
Related papers
- MMA-ASIA: A Multilingual and Multimodal Alignment Framework for Culturally-Grounded Evaluation [91.22008265721952]
MMA-ASIA centers on a human-curated, multilingual, and multimodally aligned benchmark covering 8 Asian countries and 10 languages.<n>This is the first dataset aligned at the input level across three modalities: text, image (visual question answering), and speech.<n>We propose a five-dimensional evaluation protocol that measures: (i) cultural-awareness disparities across countries, (ii) cross-lingual consistency, (iii) cross-modal consistency, (iv) cultural knowledge generalization, and (v) grounding validity.
arXiv Detail & Related papers (2025-10-07T14:12:12Z) - DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture [14.681676046750342]
DRISHTIKON is a first-of-its-kind multimodal and multilingual benchmark centered exclusively on Indian culture.<n>The dataset captures rich cultural themes including festivals, attire, cuisines, art forms, and historical heritage.<n>We evaluate a wide range of vision-language models (VLMs), including open-source small and large models, proprietary systems, reasoning-specialized VLMs, and Indic-focused models.
arXiv Detail & Related papers (2025-09-23T17:40:43Z) - CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs [57.653830744706305]
CultureScope is the most comprehensive evaluation framework to date for assessing cultural understanding in large language models.<n>Inspired by the cultural iceberg theory, we design a novel dimensional schema for cultural knowledge classification.<n> Experimental results demonstrate that our method can effectively evaluate cultural understanding.
arXiv Detail & Related papers (2025-09-19T17:47:48Z) - Grounding Multilingual Multimodal LLMs With Cultural Knowledge [48.95126394270723]
We propose a data-centric approach that grounds MLLMs in cultural knowledge.<n>CulturalGround comprises 22 million high-quality, culturally-rich VQA pairs spanning 42 countries and 39 languages.<n>We train an open-source MLLM CulturalPangea on CulturalGround, interleaving standard multilingual instruction-tuning data to preserve general abilities.
arXiv Detail & Related papers (2025-08-10T16:24:11Z) - CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis [41.261808170896686]
CulFiT is a novel training paradigm that leverages multilingual data and fine-grained reward modeling to enhance cultural sensitivity and inclusivity.<n>Our approach synthesizes diverse cultural-related questions, constructs critique data in culturally relevant languages, and employs fine-grained rewards to decompose cultural texts into verifiable knowledge units.
arXiv Detail & Related papers (2025-05-26T04:08:26Z) - TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs [13.069833806549914]
We propose the Traditional Chinese Culture understanding Benchmark (TCC-Bench) for assessing the understanding of traditional Chinese culture.<n>TCC-Bench comprises culturally rich and visually diverse data, incorporating images from museum artifacts, everyday life scenes, comics, and other culturally significant contexts.<n>We adopt a semi-automated pipeline that utilizes GPT-4o in text-only mode to generate candidate questions, followed by human curation to ensure data quality and avoid potential data leakage.
arXiv Detail & Related papers (2025-05-16T14:10:41Z) - WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models [1.094065133109559]
Large Language Models (LLMs) are predominantly trained and aligned in ways that reinforce Western-centric epistemologies and socio-cultural norms.<n>We introduce WorldView-Bench, a benchmark designed to evaluate Global Cultural Inclusivity (GCI) in LLMs by analyzing their ability to accommodate diverse worldviews.
arXiv Detail & Related papers (2025-05-14T17:43:40Z) - CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization [50.90288681622152]
Large Language Models (LLMs) more deeply integrate into human life across various regions.<n>Existing approaches develop culturally aligned LLMs through fine-tuning with culture-specific corpora.<n>We introduce CAReDiO, a novel cultural data construction framework.
arXiv Detail & Related papers (2025-04-09T13:40:13Z) - Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs [14.874327728051288]
We introduce our dataset, a year-long community-driven project covering all 22 Arab countries.<n>The dataset includes instructions in both Modern Standard Arabic (MSA) and dialectal Arabic (DA), spanning 20 diverse topics.<n>We use our dataset to evaluate the cultural and dialectal capabilities of several frontier LLMs, revealing notable limitations.
arXiv Detail & Related papers (2025-02-28T19:59:13Z) - Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models [26.051898880298126]
Investigating value alignment in large language models based on cultural context has become a critical area of research.<n>Similar biases have not been extensively explored in large vision-language models (VLMs)
arXiv Detail & Related papers (2025-02-18T19:03:02Z) - CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries [63.00147630084146]
Vision-language models (VLMs) have advanced human-AI interaction but struggle with cultural understanding.<n>CultureVerse is a large-scale multimodal benchmark covering 19, 682 cultural concepts, 188 countries/regions, 15 cultural concepts, and 3 question types.<n>We propose CultureVLM, a series of VLMs fine-tuned on our dataset to achieve significant performance improvement in cultural understanding.
arXiv Detail & Related papers (2025-01-02T14:42:37Z) - Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages [55.36534539177367]
This paper introduces Pangea, a multilingual multimodal large language model (MLLM) trained on a diverse 6M instruction dataset spanning 39 languages.<n>P Pangea significantly outperforms existing open-source models in multilingual settings and diverse cultural contexts.<n>We fully open-source our data, code, and trained checkpoints, to facilitate the development of inclusive and robust multilingual MLLMs.
arXiv Detail & Related papers (2024-10-21T16:19:41Z) - Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models [22.92083941222383]
We introduce DalleStreet, a large-scale dataset generated by DALL-E 3 and validated by humans.
We find disparities in cultural understanding at geographic sub-region levels with both open-source (LLaVA) and closed-source (GPT-4V) models.
Our findings reveal a nuanced picture of the cultural competence of LMMs, highlighting the need to develop culture-aware systems.
arXiv Detail & Related papers (2024-07-02T08:55:41Z) - CulturePark: Boosting Cross-cultural Understanding in Large Language Models [63.452948673344395]
This paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection.
It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs.
We evaluate these models across three downstream tasks: content moderation, cultural alignment, and cultural education.
arXiv Detail & Related papers (2024-05-24T01:49:02Z) - CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge [69.82940934994333]
We introduce CulturalTeaming, an interactive red-teaming system that leverages human-AI collaboration to build challenging evaluation dataset.
Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions.
CULTURALBENCH-V0.1 is a compact yet high-quality evaluation dataset with users' red-teaming attempts.
arXiv Detail & Related papers (2024-04-10T00:25:09Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models [41.885600036131045]
CDEval is a benchmark aimed at evaluating the cultural dimensions of Large Language Models.
It is constructed by incorporating both GPT-4's automated generation and human verification, covering six cultural dimensions across seven domains.
arXiv Detail & Related papers (2023-11-28T02:01:25Z) - AceGPT, Localizing Large Language Models in Arabic [73.39989503874634]
The paper proposes a comprehensive solution that includes pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic.
The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities.
arXiv Detail & Related papers (2023-09-21T13:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.