Related papers: XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs

XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs

URL: http://arxiv.org/abs/2601.14063v1
Date: Tue, 20 Jan 2026 15:21:18 GMT
Title: XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs
Authors: Mohsinul Kabir, Tasnim Ahmed, Md Mezbaur Rahman, Shaoxiong Ji, Hassan Alhuzali, Sophia Ananiadou,
Abstract summary: Cross-cultural competence in large language models (LLMs) requires the ability to identify Culture-Specific Items (CSIs)<n>We introduce XCR-Bench, a Cross(X)-Cultural Reasoning Benchmark consisting of 4.9k parallel sentences and 1,098 unique CSIs.<n>Our findings show that state-of-the-art LLMs exhibit consistent weaknesses in identifying and adapting CSIs related to social etiquette and cultural reference.
Score: 20.548049824884668
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Cross-cultural competence in large language models (LLMs) requires the ability to identify Culture-Specific Items (CSIs) and to adapt them appropriately across cultural contexts. Progress in evaluating this capability has been constrained by the scarcity of high-quality CSI-annotated corpora with parallel cross-cultural sentence pairs. To address this limitation, we introduce XCR-Bench, a Cross(X)-Cultural Reasoning Benchmark consisting of 4.9k parallel sentences and 1,098 unique CSIs, spanning three distinct reasoning tasks with corresponding evaluation metrics. Our corpus integrates Newmark's CSI framework with Hall's Triad of Culture, enabling systematic analysis of cultural reasoning beyond surface-level artifacts and into semi-visible and invisible cultural elements such as social norms, beliefs, and values. Our findings show that state-of-the-art LLMs exhibit consistent weaknesses in identifying and adapting CSIs related to social etiquette and cultural reference. Additionally, we find evidence that LLMs encode regional and ethno-religious biases even within a single linguistic setting during cultural adaptation. We release our corpus and code to facilitate future research on cross-cultural NLP.

Related papers

LLMs as Cultural Archives: Cultural Commonsense Knowledge Graph Extraction [57.23766971626989]
Large language models (LLMs) encode rich cultural knowledge learned from diverse web-scale data.<n>We present an iterative, prompt-based framework for constructing a Cultural Commonsense Knowledge Graph (CCKG)<n>We find that the cultural knowledge graphs are better realized in English, even when the target culture is non-English.
arXiv Detail & Related papers (2026-01-25T20:05:04Z)
Do Large Language Models Truly Understand Cross-cultural Differences? [53.481048019144644]
We develop a scenario-based benchmark to evaluate large language models' cross-cultural understanding and reasoning.<n>Grounded in cultural theory, we categorize cross-cultural capabilities into nine dimensions.<n>The dataset supports continuous expansion, and experiments confirm its transferability to other languages.
arXiv Detail & Related papers (2025-12-08T01:21:58Z)
Culturally-Aware Conversations: A Framework & Benchmark for LLMs [8.314136556868563]
Existing benchmarks that measure cultural adaptation in LLMs are misaligned with the actual challenges these models face when interacting with users from diverse cultural backgrounds.<n>Grounded in sociocultural theory, our framework formalizes how linguistic style is shaped by situational, relational, and cultural context.<n>We construct a benchmark dataset based on this framework, annotated by culturally diverse raters, and propose a new set of desiderata for cross-cultural evaluation in NLP.
arXiv Detail & Related papers (2025-10-13T16:06:14Z)
'Too much alignment; not enough culture': Re-balancing cultural alignment practices in LLMs [0.0]
This paper argues for a shift towards integrating qualitative approaches into AI alignment practices.<n> Drawing inspiration from Clifford Geertz's concept of "thick description," we propose that AI systems must produce outputs that reflect deeper cultural meanings.
arXiv Detail & Related papers (2025-09-30T12:22:53Z)
CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs [57.653830744706305]
CultureScope is the most comprehensive evaluation framework to date for assessing cultural understanding in large language models.<n>Inspired by the cultural iceberg theory, we design a novel dimensional schema for cultural knowledge classification.<n> Experimental results demonstrate that our method can effectively evaluate cultural understanding.
arXiv Detail & Related papers (2025-09-19T17:47:48Z)
From Word to World: Evaluate and Mitigate Culture Bias in LLMs via Word Association Test [50.51344198689069]
We extend the human-centered word association test (WAT) to assess the alignment of large language models with cross-cultural cognition.<n>To address culture preference, we propose CultureSteer, an innovative approach by embedding cultural-specific semantic associations directly within the model's internal representation space.
arXiv Detail & Related papers (2025-05-24T07:05:10Z)
Cultural Learning-Based Culture Adaptation of Language Models [70.1063219524999]
Adapting large language models (LLMs) to diverse cultural values is a challenging task.<n>We present CLCA, a novel framework for enhancing LLM alignment with cultural values based on cultural learning.
arXiv Detail & Related papers (2025-04-03T18:16:26Z)
Culture is Not Trivia: Sociocultural Theory for Cultural NLP [10.76392030245232]
We argue that these methodological limitations are symptomatic of a theoretical gap.<n>We draw on a well-developed theory of culture from sociocultural linguistics to fill this gap.
arXiv Detail & Related papers (2025-02-17T17:25:11Z)
Benchmarking Machine Translation with Cultural Awareness [50.183458829028226]
Translating culture-related content is vital for effective cross-cultural communication. Many culture-specific items (CSIs) often lack viable translations across languages. This difficulty hinders the analysis of cultural awareness of machine translation systems.
arXiv Detail & Related papers (2023-05-23T17:56:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.