Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages
- URL: http://arxiv.org/abs/2510.05291v1
- Date: Mon, 06 Oct 2025 18:59:11 GMT
- Title: Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages
- Authors: Tarek Naous, Anagha Savit, Carlos Rafael Catalan, Geyang Guo, Jaehyeok Lee, Kyungdon Lee, Lheane Marie Dizon, Mengyu Ye, Neel Kothari, Sahajpreet Singh, Sarah Masud, Tanish Patwa, Trung Thanh Tran, Zohaib Khan, Alan Ritter, JinYeong Bak, Keisuke Sakaguchi, Tanmoy Chakraborty, Yuki Arase, Wei Xu,
- Abstract summary: We introduce Camellia, a benchmark for measuring entity-centric cultural biases in nine Asian languages spanning six distinct Asian cultures.<n>We evaluate cultural biases in four recent multilingual Large Language Models across various tasks such as cultural context adaptation, sentiment association, and entity extractive QA.<n>Our analyses show a struggle by LLMs at cultural adaptation in all Asian languages, with performance differing across models developed in regions with varying access to culturally-relevant data.
- Score: 46.3747338016989
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As Large Language Models (LLMs) gain stronger multilingual capabilities, their ability to handle culturally diverse entities becomes crucial. Prior work has shown that LLMs often favor Western-associated entities in Arabic, raising concerns about cultural fairness. Due to the lack of multilingual benchmarks, it remains unclear if such biases also manifest in different non-Western languages. In this paper, we introduce Camellia, a benchmark for measuring entity-centric cultural biases in nine Asian languages spanning six distinct Asian cultures. Camellia includes 19,530 entities manually annotated for association with the specific Asian or Western culture, as well as 2,173 naturally occurring masked contexts for entities derived from social media posts. Using Camellia, we evaluate cultural biases in four recent multilingual LLM families across various tasks such as cultural context adaptation, sentiment association, and entity extractive QA. Our analyses show a struggle by LLMs at cultural adaptation in all Asian languages, with performance differing across models developed in regions with varying access to culturally-relevant data. We further observe that different LLM families hold their distinct biases, differing in how they associate cultures with particular sentiments. Lastly, we find that LLMs struggle with context understanding in Asian languages, creating performance gaps between cultures in entity extraction.
Related papers
- LLMs as Cultural Archives: Cultural Commonsense Knowledge Graph Extraction [57.23766971626989]
Large language models (LLMs) encode rich cultural knowledge learned from diverse web-scale data.<n>We present an iterative, prompt-based framework for constructing a Cultural Commonsense Knowledge Graph (CCKG)<n>We find that the cultural knowledge graphs are better realized in English, even when the target culture is non-English.
arXiv Detail & Related papers (2026-01-25T20:05:04Z) - LLMs and Cultural Values: the Impact of Prompt Language and Explicit Cultural Framing [0.21485350418225244]
Large Language Models (LLMs) are rapidly being adopted by users across the globe, who interact with them in a diverse range of languages.<n>We examine how prompt language and cultural framing influence model responses and their alignment with human values in different countries.
arXiv Detail & Related papers (2025-11-06T02:09:29Z) - Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World [68.19795061447044]
This paper investigates cross-cultural transfer of commonsense reasoning in the Arab world.<n>Using a culturally grounded commonsense reasoning dataset covering 13 Arab countries, we evaluate lightweight alignment methods.<n>Our results show that merely 12 culture-specific examples from one country can improve performance in others by 10% on average.
arXiv Detail & Related papers (2025-09-23T17:24:14Z) - Self-Pluralising Culture Alignment for Large Language Models [36.689491885394034]
We propose CultureSPA, a framework that allows large language models to align to pluralistic cultures.
By comparing culture-aware/unaware outputs, we are able to detect and collect culture-related instances.
Extensive experiments demonstrate that CultureSPA significantly improves the alignment of LLMs to diverse cultures without compromising general abilities.
arXiv Detail & Related papers (2024-10-16T19:06:08Z) - Analyzing Cultural Representations of Emotions in LLMs through Mixed Emotion Survey [2.9213203896291766]
This study focuses on analyzing the cultural representations of emotions in Large Language Models (LLMs)
Our methodology is based on the studies of Miyamoto et al. (2010), which identified distinctive emotional indicators in Japanese and American human responses.
We find that models have limited alignment with the evidence in the literature.
arXiv Detail & Related papers (2024-08-04T20:56:05Z) - Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense [98.09670425244462]
Large language models (LLMs) have demonstrated substantial commonsense understanding.
This paper examines the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks.
arXiv Detail & Related papers (2024-05-07T20:28:34Z) - CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting [73.94059188347582]
We uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations.
We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures.
arXiv Detail & Related papers (2024-04-16T00:50:43Z) - Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge [47.57055368312541]
We introduce FmLAMA, a multilingual dataset centered on food-related cultural facts and variations in food practices.<n>We analyze LLMs across various architectures and configurations, evaluating their performance in both monolingual and multilingual settings.
arXiv Detail & Related papers (2024-04-10T08:49:27Z) - Having Beer after Prayer? Measuring Cultural Bias in Large Language Models [25.722262209465846]
We show that multilingual and Arabic monolingual LMs exhibit bias towards entities associated with Western culture.
We introduce CAMeL, a novel resource of 628 naturally-occurring prompts and 20,368 entities spanning eight types that contrast Arab and Western cultures.
Using CAMeL, we examine the cross-cultural performance in Arabic of 16 different LMs on tasks such as story generation, NER, and sentiment analysis.
arXiv Detail & Related papers (2023-05-23T18:27:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.