Related papers: TALES: A Taxonomy and Analysis of Cultural Representations in LLM-generated Stories

TALES: A Taxonomy and Analysis of Cultural Representations in LLM-generated Stories

URL: http://arxiv.org/abs/2511.21322v1
Date: Wed, 26 Nov 2025 12:07:32 GMT
Title: TALES: A Taxonomy and Analysis of Cultural Representations in LLM-generated Stories
Authors: Kirti Bhagat, Shaily Bhatt, Athul Velagapudi, Aditya Vashistha, Shachi Dave, Danish Pruthi,
Abstract summary: We present TALES, an evaluation of cultural misrepresentations in LLM-generated stories for diverse Indian cultural identities.<n>We develop TALES-Tax, a taxonomy of cultural misrepresentations by collating insights from participants with lived experiences in India.<n>We transform the annotations into TALES-QA, a standalone question bank to evaluate the cultural knowledge of foundational models.
Score: 24.375203423945816
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Millions of users across the globe turn to AI chatbots for their creative needs, inviting widespread interest in understanding how such chatbots represent diverse cultures. At the same time, evaluating cultural representations in open-ended tasks remains challenging and underexplored. In this work, we present TALES, an evaluation of cultural misrepresentations in LLM-generated stories for diverse Indian cultural identities. First, we develop TALES-Tax, a taxonomy of cultural misrepresentations by collating insights from participants with lived experiences in India through focus groups (N=9) and individual surveys (N=15). Using TALES-Tax, we evaluate 6 models through a large-scale annotation study spanning 2,925 annotations from 108 annotators with lived cultural experience from across 71 regions in India and 14 languages. Concerningly, we find that 88\% of the generated stories contain one or more cultural inaccuracies, and such errors are more prevalent in mid- and low-resourced languages and stories based in peri-urban regions in India. Lastly, we transform the annotations into TALES-QA, a standalone question bank to evaluate the cultural knowledge of foundational models. Through this evaluation, we surprisingly discover that models often possess the requisite cultural knowledge despite generating stories rife with cultural misrepresentations.

Related papers

Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge [68.6805229085352]
Most multilingual question-answering benchmarks do not factor in regional diversity in the information they capture.<n>XNationQA encompasses a total of 49,280 questions on the geography, culture, and history of nine countries, presented in seven languages.<n>We benchmark eight standard multilingual LLMs on XNationQA and evaluate them using two novel transference metrics.
arXiv Detail & Related papers (2025-11-01T18:41:34Z)
From Facts to Folklore: Evaluating Large Language Models on Bengali Cultural Knowledge [7.322034156204158]
We show that large language models (LLMs) struggle with cultural knowledge and performance when context is provided.<n>Our work addresses these limitations through a Bengali Language Cultural Knowledge dataset including folk traditions, culinary arts, and regional dialects.<n>Our investigation of several multilingual language models shows that while these models perform well in non-cultural categories, they struggle significantly with cultural knowledge and performance improves substantially when context is provided.
arXiv Detail & Related papers (2025-10-22T21:42:59Z)
Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World [68.19795061447044]
This paper investigates cross-cultural transfer of commonsense reasoning in the Arab world.<n>Using a culturally grounded commonsense reasoning dataset covering 13 Arab countries, we evaluate lightweight alignment methods.<n>Our results show that merely 12 culture-specific examples from one country can improve performance in others by 10% on average.
arXiv Detail & Related papers (2025-09-23T17:24:14Z)
CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs [57.653830744706305]
CultureScope is the most comprehensive evaluation framework to date for assessing cultural understanding in large language models.<n>Inspired by the cultural iceberg theory, we design a novel dimensional schema for cultural knowledge classification.<n> Experimental results demonstrate that our method can effectively evaluate cultural understanding.
arXiv Detail & Related papers (2025-09-19T17:47:48Z)
SANSKRITI: A Comprehensive Benchmark for Evaluating Language Models' Knowledge of Indian Culture [17.414561140897945]
We introduce SANSKRITI, a benchmark designed to evaluate language models' comprehension of India's rich cultural diversity.<n>Comprising 21,853 meticulously curated question-answer pairs spanning 28 states and 8 union territories, SANSKRITI is the largest dataset for testing Indian cultural knowledge.<n>It covers sixteen key attributes of Indian culture: rituals and ceremonies, history, tourism, cuisine, dance and music, costume, language, art, festivals, religion, medicine, transport, sports, nightlife, and personalities.
arXiv Detail & Related papers (2025-06-18T11:19:25Z)
MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMs [37.98920430188422]
MAKIEval is an automatic multilingual framework for evaluating cultural awareness in large language models.<n>It automatically identifies cultural entities in model outputs and links them to structured knowledge.<n>We assess 7 LLMs developed from different parts of the world, encompassing both open-source and proprietary systems.
arXiv Detail & Related papers (2025-05-27T19:29:40Z)
From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs [62.9861554207279]
Adapting cultural values in Large Language Models (LLMs) presents significant challenges.<n>Prior work primarily aligns LLMs with different cultural values using World Values Survey (WVS) data.<n>We investigate WVS-based training for cultural value adaptation and find that relying solely on survey data cane cultural norms and interfere with factual knowledge.
arXiv Detail & Related papers (2025-05-22T09:00:01Z)
DaKultur: Evaluating the Cultural Awareness of Language Models for Danish with Native Speakers [17.355452637877402]
We conduct the first cultural evaluation study for the mid-resource language of Danish, in which native speakers prompt different models to solve tasks requiring cultural awareness.<n>Our analysis of the resulting 1,038 interactions from 63 demographically diverse participants highlights open challenges to cultural adaptation.
arXiv Detail & Related papers (2025-04-03T08:52:42Z)
CulturePark: Boosting Cross-cultural Understanding in Large Language Models [63.452948673344395]
This paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection. It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs. We evaluate these models across three downstream tasks: content moderation, cultural alignment, and cultural education.
arXiv Detail & Related papers (2024-05-24T01:49:02Z)
CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting [73.94059188347582]
We uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations. We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures.
arXiv Detail & Related papers (2024-04-16T00:50:43Z)
Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions [10.415002561977655]
This research proposes a Cultural Alignment Test (Hoftede's CAT) to quantify cultural alignment using Hofstede's cultural dimension framework. We quantitatively evaluate large language models (LLMs) against the cultural dimensions of regions like the United States, China, and Arab countries. Our results quantify the cultural alignment of LLMs and reveal the difference between LLMs in explanatory cultural dimensions.
arXiv Detail & Related papers (2023-08-25T14:50:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.