A Unified Framework to Quantify Cultural Intelligence of AI
- URL: http://arxiv.org/abs/2603.01211v1
- Date: Sun, 01 Mar 2026 18:14:52 GMT
- Title: A Unified Framework to Quantify Cultural Intelligence of AI
- Authors: Sunipa Dev, Vinodkumar Prabhakaran, Rutledge Chin Feman, Aida Davani, Remi Denton, Charu Kalia, Piyawat L Kumjorn, Madhurima Maji, Rida Qadri, Negar Rostamzadeh, Renee Shelby, Romina Stella, Hayk Stepanyan, Erin van Liemt, Aishwarya Verma, Oscar Wahltinez, Edem Wornyo, Andrew Zaldivar, Saška Mojsilović,
- Abstract summary: We present a principled framework to aggregate multifaceted indicators of cultural capabilities into a unified assessment of cultural intelligence.<n>We conceptualize cultural intelligence as a suite of core capabilities spanning diverse domains, which we then operationalize through a set of indicators.
- Score: 16.75352229274232
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As generative AI technologies are increasingly being launched across the globe, assessing their competence to operate in different cultural contexts is exigently becoming a priority. While recent years have seen numerous and much-needed efforts on cultural benchmarking, these efforts have largely focused on specific aspects of culture and evaluation. While these efforts contribute to our understanding of cultural competence, a unified and systematic evaluation approach is needed for us as a field to comprehensively assess diverse cultural dimensions at scale. Drawing on measurement theory, we present a principled framework to aggregate multifaceted indicators of cultural capabilities into a unified assessment of cultural intelligence. We start by developing a working definition of culture that includes identifying core domains of culture. We then introduce a broad-purpose, systematic, and extensible framework for assessing cultural intelligence of AI systems. Drawing on theoretical framing from psychometric measurement validity theory, we decouple the background concept (i.e., cultural intelligence) from its operationalization via measurement. We conceptualize cultural intelligence as a suite of core capabilities spanning diverse domains, which we then operationalize through a set of indicators designed for reliable measurement. Finally, we identify the considerations, challenges, and research pathways to meaningfully measure these indicators, specifically focusing on data collection, probing strategies, and evaluation metrics.
Related papers
- 'Too much alignment; not enough culture': Re-balancing cultural alignment practices in LLMs [0.0]
This paper argues for a shift towards integrating qualitative approaches into AI alignment practices.<n> Drawing inspiration from Clifford Geertz's concept of "thick description," we propose that AI systems must produce outputs that reflect deeper cultural meanings.
arXiv Detail & Related papers (2025-09-30T12:22:53Z) - CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs [57.653830744706305]
CultureScope is the most comprehensive evaluation framework to date for assessing cultural understanding in large language models.<n>Inspired by the cultural iceberg theory, we design a novel dimensional schema for cultural knowledge classification.<n> Experimental results demonstrate that our method can effectively evaluate cultural understanding.
arXiv Detail & Related papers (2025-09-19T17:47:48Z) - Culture is Everywhere: A Call for Intentionally Cultural Evaluation [36.20861746863831]
We argue for textbfintentionally cultural evaluation: an approach that systematically examines the cultural assumptions embedded in all aspects of evaluation.<n>We discuss implications and future directions for moving beyond current benchmarking practices.
arXiv Detail & Related papers (2025-09-01T09:39:21Z) - Toward Socially Aware Vision-Language Models: Evaluating Cultural Competence Through Multimodal Story Generation [2.0467354053171243]
We present the first comprehensive evaluation of Vision-Language Models (VLMs) cultural competence through multimodal story generation.<n>Our analysis reveals significant cultural adaptation capabilities, with rich culturally-specific vocabulary spanning names, familial terms, and geographic markers.<n>We uncover concerning limitations: cultural competence varies dramatically across architectures, some models exhibit inverse cultural alignment, and automated metrics show architectural bias contradicting human assessments.
arXiv Detail & Related papers (2025-08-22T19:39:02Z) - CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization [50.90288681622152]
Large Language Models (LLMs) more deeply integrate into human life across various regions.<n>Existing approaches develop culturally aligned LLMs through fine-tuning with culture-specific corpora.<n>We introduce CAReDiO, a novel cultural data construction framework.
arXiv Detail & Related papers (2025-04-09T13:40:13Z) - CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries [63.00147630084146]
Vision-language models (VLMs) have advanced human-AI interaction but struggle with cultural understanding.<n>CultureVerse is a large-scale multimodal benchmark covering 19, 682 cultural concepts, 188 countries/regions, 15 cultural concepts, and 3 question types.<n>We propose CultureVLM, a series of VLMs fine-tuned on our dataset to achieve significant performance improvement in cultural understanding.
arXiv Detail & Related papers (2025-01-02T14:42:37Z) - CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts [45.77570690529597]
We introduce CROPE, a visual question answering benchmark designed to probe the knowledge of culture-specific concepts.<n>Our evaluation of several state-of-the-art open Vision and Language models shows large performance disparities between culture-specific and common concepts.<n>Experiments with contextual knowledge indicate that models struggle to effectively utilize multimodal information and bind culture-specific concepts to their depictions.
arXiv Detail & Related papers (2024-10-20T17:31:19Z) - Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.