Identity-Aware Large Language Models require Cultural Reasoning
- URL: http://arxiv.org/abs/2510.18510v1
- Date: Tue, 21 Oct 2025 10:50:51 GMT
- Title: Identity-Aware Large Language Models require Cultural Reasoning
- Authors: Alistair Plum, Anne-Marie Lutgen, Christoph Purschke, Achim Rettinger,
- Abstract summary: We define cultural reasoning as the capacity of a model to recognise culture-specific knowledge values and social norms.<n>Because culture shapes interpretation, emotional resonance, and acceptable behaviour, cultural reasoning is essential for identity-aware AI.<n>We argue that cultural reasoning must be treated as a foundational capability alongside factual accuracy and linguistic coherence.
- Score: 3.1866496693431934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models have become the latest trend in natural language processing, heavily featuring in the digital tools we use every day. However, their replies often reflect a narrow cultural viewpoint that overlooks the diversity of global users. This missing capability could be referred to as cultural reasoning, which we define here as the capacity of a model to recognise culture-specific knowledge values and social norms, and to adjust its output so that it aligns with the expectations of individual users. Because culture shapes interpretation, emotional resonance, and acceptable behaviour, cultural reasoning is essential for identity-aware AI. When this capacity is limited or absent, models can sustain stereotypes, ignore minority perspectives, erode trust, and perpetuate hate. Recent empirical studies strongly suggest that current models default to Western norms when judging moral dilemmas, interpreting idioms, or offering advice, and that fine-tuning on survey data only partly reduces this tendency. The present evaluation methods mainly report static accuracy scores and thus fail to capture adaptive reasoning in context. Although broader datasets can help, they cannot alone ensure genuine cultural competence. Therefore, we argue that cultural reasoning must be treated as a foundational capability alongside factual accuracy and linguistic coherence. By clarifying the concept and outlining initial directions for its assessment, a foundation is laid for future systems to be able to respond with greater sensitivity to the complex fabric of human culture.
Related papers
- CURE: Cultural Understanding and Reasoning Evaluation - A Framework for "Thick" Culture Alignment Evaluation in LLMs [24.598338950728234]
Large language models (LLMs) are increasingly deployed in culturally diverse environments.<n>Existing methods focus on de-contextualized correctness or forced-choice judgments.<n>We introduce a set of benchmarks that present models with realistic situational contexts.
arXiv Detail & Related papers (2025-11-15T03:39:13Z) - CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs [57.653830744706305]
CultureScope is the most comprehensive evaluation framework to date for assessing cultural understanding in large language models.<n>Inspired by the cultural iceberg theory, we design a novel dimensional schema for cultural knowledge classification.<n> Experimental results demonstrate that our method can effectively evaluate cultural understanding.
arXiv Detail & Related papers (2025-09-19T17:47:48Z) - Culture is Everywhere: A Call for Intentionally Cultural Evaluation [36.20861746863831]
We argue for textbfintentionally cultural evaluation: an approach that systematically examines the cultural assumptions embedded in all aspects of evaluation.<n>We discuss implications and future directions for moving beyond current benchmarking practices.
arXiv Detail & Related papers (2025-09-01T09:39:21Z) - CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation [61.130639734982395]
We introduce CAIRe, a novel evaluation metric that assesses the degree of cultural relevance of an image.<n>Our framework grounds entities and concepts in the image to a knowledge base and uses factual information to give independent graded judgments for each culture label.
arXiv Detail & Related papers (2025-06-10T17:16:23Z) - From Word to World: Evaluate and Mitigate Culture Bias in LLMs via Word Association Test [50.51344198689069]
We extend the human-centered word association test (WAT) to assess the alignment of large language models with cross-cultural cognition.<n>To address culture preference, we propose CultureSteer, an innovative approach by embedding cultural-specific semantic associations directly within the model's internal representation space.
arXiv Detail & Related papers (2025-05-24T07:05:10Z) - From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs [62.9861554207279]
Adapting cultural values in Large Language Models (LLMs) presents significant challenges.<n>Prior work primarily aligns LLMs with different cultural values using World Values Survey (WVS) data.<n>We investigate WVS-based training for cultural value adaptation and find that relying solely on survey data cane cultural norms and interfere with factual knowledge.
arXiv Detail & Related papers (2025-05-22T09:00:01Z) - Risks of Cultural Erasure in Large Language Models [4.613949381428196]
We argue for the need of metricizable evaluations of language technologies that interrogate and account for historical power inequities.<n>We probe representations that a language model produces about different places around the world when asked to describe these contexts.<n>We analyze the cultures represented in the travel recommendations produced by a set of language model applications.
arXiv Detail & Related papers (2025-01-02T04:57:50Z) - Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models [4.771099208181585]
LLMs are increasingly deployed in global applications, ensuring users from diverse backgrounds feel respected and understood.<n>Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values.<n>We present two key contributions: A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and a culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators.
arXiv Detail & Related papers (2024-10-15T18:13:10Z) - Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - Cultural Bias and Cultural Alignment of Large Language Models [0.9374652839580183]
We conduct a disaggregated evaluation of cultural bias for five widely used large language models.
All models exhibit cultural values resembling English-speaking and Protestant European countries.
We suggest using cultural prompting and ongoing evaluation to reduce cultural bias in the output of generative AI.
arXiv Detail & Related papers (2023-11-23T16:45:56Z) - Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in
Large Language Models [89.94270049334479]
This paper identifies a cultural dominance issue within large language models (LLMs)
LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages.
arXiv Detail & Related papers (2023-10-19T05:38:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.