Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models
- URL: http://arxiv.org/abs/2508.08879v1
- Date: Tue, 12 Aug 2025 12:05:32 GMT
- Title: Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models
- Authors: Haeun Yu, Seogyeong Jeong, Siddhesh Pawar, Jisu Shin, Jiho Jin, Junho Myung, Alice Oh, Isabelle Augenstein,
- Abstract summary: We propose Culturescope, the first interpretability-based method that probes the internal representations of large language models.<n>We introduce a cultural flattening score as a measure of the intrinsic cultural biases.<n>Our experimental results reveal that LLMs encode Western-dominance bias and cultural flattening in their cultural knowledge space.
- Score: 42.367959511140036
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing deployment of large language models (LLMs) across diverse cultural contexts necessitates a better understanding of how the overgeneralization of less documented cultures within LLMs' representations impacts their cultural understanding. Prior work only performs extrinsic evaluation of LLMs' cultural competence, without accounting for how LLMs' internal mechanisms lead to cultural (mis)representation. To bridge this gap, we propose Culturescope, the first mechanistic interpretability-based method that probes the internal representations of LLMs to elicit the underlying cultural knowledge space. CultureScope utilizes a patching method to extract the cultural knowledge. We introduce a cultural flattening score as a measure of the intrinsic cultural biases. Additionally, we study how LLMs internalize Western-dominance bias and cultural flattening, which allows us to trace how cultural biases emerge within LLMs. Our experimental results reveal that LLMs encode Western-dominance bias and cultural flattening in their cultural knowledge space. We find that low-resource cultures are less susceptible to cultural biases, likely due to their limited training resources. Our work provides a foundation for future research on mitigating cultural biases and enhancing LLMs' cultural understanding. Our codes and data used for experiments are publicly available.
Related papers
- LLMs as Cultural Archives: Cultural Commonsense Knowledge Graph Extraction [57.23766971626989]
Large language models (LLMs) encode rich cultural knowledge learned from diverse web-scale data.<n>We present an iterative, prompt-based framework for constructing a Cultural Commonsense Knowledge Graph (CCKG)<n>We find that the cultural knowledge graphs are better realized in English, even when the target culture is non-English.
arXiv Detail & Related papers (2026-01-25T20:05:04Z) - CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs [57.653830744706305]
CultureScope is the most comprehensive evaluation framework to date for assessing cultural understanding in large language models.<n>Inspired by the cultural iceberg theory, we design a novel dimensional schema for cultural knowledge classification.<n> Experimental results demonstrate that our method can effectively evaluate cultural understanding.
arXiv Detail & Related papers (2025-09-19T17:47:48Z) - From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs [57.43233760384488]
Adapting cultural values in Large Language Models (LLMs) presents significant challenges.<n>Prior work primarily aligns LLMs with different cultural values using World Values Survey (WVS) data.<n>In this paper, we investigate WVS-based training for cultural value adaptation and find that relying solely on survey data cane cultural norms and interfere with factual knowledge.
arXiv Detail & Related papers (2025-05-22T09:00:01Z) - An Evaluation of Cultural Value Alignment in LLM [27.437888319382893]
We conduct the first large-scale evaluation of LLM culture assessing 20 countries' cultures and languages across ten LLMs.<n>Our findings show that the output over all models represents a moderate cultural middle ground.<n> Deeper investigation sheds light on the influence of model origin, prompt language, and value dimensions on cultural output.
arXiv Detail & Related papers (2025-04-11T09:13:19Z) - Through the Prism of Culture: Evaluating LLMs' Understanding of Indian Subcultures and Traditions [9.357186653223332]
We evaluate the capacity of Large Language Models to recognize and accurately respond to the Little Traditions within Indian society.<n>Through a series of case studies, we assess whether LLMs can balance the interplay between dominant Great Traditions and localized Little Traditions.<n>Our findings reveal that while LLMs demonstrate an ability to articulate cultural nuances, they often struggle to apply this understanding in practical, context-specific scenarios.
arXiv Detail & Related papers (2025-01-28T06:58:25Z) - Survey of Cultural Awareness in Language Models: Text and Beyond [39.77033652289063]
Large-scale deployment of large language models (LLMs) in various applications requires LLMs to be culturally sensitive to the user to ensure inclusivity.
Culture has been widely studied in psychology and anthropology, and there has been a recent surge in research on making LLMs more culturally inclusive.
arXiv Detail & Related papers (2024-10-30T16:37:50Z) - Self-Pluralising Culture Alignment for Large Language Models [36.689491885394034]
We propose CultureSPA, a framework that allows large language models to align to pluralistic cultures.
By comparing culture-aware/unaware outputs, we are able to detect and collect culture-related instances.
Extensive experiments demonstrate that CultureSPA significantly improves the alignment of LLMs to diverse cultures without compromising general abilities.
arXiv Detail & Related papers (2024-10-16T19:06:08Z) - Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense [98.09670425244462]
Large language models (LLMs) have demonstrated substantial commonsense understanding.
This paper examines the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks.
arXiv Detail & Related papers (2024-05-07T20:28:34Z) - CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting [73.94059188347582]
We uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations.
We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures.
arXiv Detail & Related papers (2024-04-16T00:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.