Related papers: Common to Whom? Regional Cultural Commonsense and LLM Bias in India

Common to Whom? Regional Cultural Commonsense and LLM Bias in India

URL: http://arxiv.org/abs/2601.15550v2
Date: Wed, 28 Jan 2026 15:00:16 GMT
Title: Common to Whom? Regional Cultural Commonsense and LLM Bias in India
Authors: Sangmitra Madhusudan, Trush Shashank More, Steph Buongiorno, Renata Dividino, Jad Kabbara, Ali Emami,
Abstract summary: We introduce Indica, the first benchmark designed to test LLMs' ability to address this question.<n>We collect human-annotated answers from five Indian regions across 515 questions spanning 8 domains of everyday life.<n>Strikingly, only 39.4% of questions elicit agreement across all five regions.
Score: 15.897268984598043
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing cultural commonsense benchmarks treat nations as monolithic, assuming uniform practices within national boundaries. But does cultural commonsense hold uniformly within a nation, or does it vary at the sub-national level? We introduce Indica, the first benchmark designed to test LLMs' ability to address this question, focusing on India - a nation of 28 states, 8 union territories, and 22 official languages. We collect human-annotated answers from five Indian regions (North, South, East, West, and Central) across 515 questions spanning 8 domains of everyday life, yielding 1,630 region-specific question-answer pairs. Strikingly, only 39.4% of questions elicit agreement across all five regions, demonstrating that cultural commonsense in India is predominantly regional, not national. We evaluate eight state-of-the-art LLMs and find two critical gaps: models achieve only 13.4%-20.9% accuracy on region-specific questions, and they exhibit geographic bias, over-selecting Central and North India as the "default" (selected 30-40% more often than expected) while under-representing East and West. Beyond India, our methodology provides a generalizable framework for evaluating cultural commonsense in any culturally heterogeneous nation, from question design grounded in anthropological taxonomy, to regional data collection, to bias measurement.

Related papers

Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge [68.6805229085352]
Most multilingual question-answering benchmarks do not factor in regional diversity in the information they capture.<n>XNationQA encompasses a total of 49,280 questions on the geography, culture, and history of nine countries, presented in seven languages.<n>We benchmark eight standard multilingual LLMs on XNationQA and evaluate them using two novel transference metrics.
arXiv Detail & Related papers (2025-11-01T18:41:34Z)
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures [117.95352635059153]
We present Global PIQA, a participatory commonsense reasoning benchmark for over 100 languages.<n>The 116 language varieties in Global PIQA cover five continents, 14 language families, and 23 writing systems.<n>In the non-parallel split of Global PIQA, over 50% of examples reference local foods, customs, traditions, or other culturally-specific elements.
arXiv Detail & Related papers (2025-10-28T05:46:25Z)
Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World [68.19795061447044]
This paper investigates cross-cultural transfer of commonsense reasoning in the Arab world.<n>Using a culturally grounded commonsense reasoning dataset covering 13 Arab countries, we evaluate lightweight alignment methods.<n>Our results show that merely 12 culture-specific examples from one country can improve performance in others by 10% on average.
arXiv Detail & Related papers (2025-09-23T17:24:14Z)
DIWALI - Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context [7.582991335459645]
Large language models (LLMs) are widely used in various tasks and applications.<n>They are shown to lack cultural alignment due to a lack of cultural knowledge and competence.<n>We introduce a novel CSI dataset for Indian culture, belonging to 17 cultural facets.
arXiv Detail & Related papers (2025-09-22T06:58:02Z)
BharatBBQ: A Multilingual Bias Benchmark for Question Answering in the Indian Context [36.56689822791777]
Existing benchmarks, such as the Bias Benchmark for Question Answering (BBQ), primarily focus on Western contexts.<n>We introduce BharatBBQ, a culturally adapted benchmark designed to assess biases in Hindi, English, Marathi, Bengali, Tamil, Telugu, Odia, and Assamese.<n>Our dataset contains 49,108 examples in one language that are expanded using translation and verification to 392,864 examples in eight different languages.
arXiv Detail & Related papers (2025-08-09T20:24:24Z)
FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes [23.71105683137539]
Existing studies on fairness are largely Western-focused, making them inadequate for culturally diverse countries such as India.<n>We introduce INDIC-BIAS, a comprehensive India-centric benchmark designed to evaluate fairness of LLMs across 85 socio identity groups.
arXiv Detail & Related papers (2025-06-29T06:31:06Z)
Fluent but Foreign: Even Regional LLMs Lack Cultural Alignment [24.871503011248777]
Large language models (LLMs) are used worldwide, yet exhibit Western cultural tendencies.<n>We evaluate six Indic and six global LLMs on two dimensions -- values and practices.<n>Across tasks, Indic models do not align better with Indian norms than global models.
arXiv Detail & Related papers (2025-05-25T01:59:23Z)
CulturalBench: A Robust, Diverse, and Challenging Cultural Benchmark by Human-AI CulturalTeaming [75.82306181299153]
CulturalBench is a set of 1,696 human-written and human-verified questions to assess LMs' cultural knowledge.<n>It covers 45 global regions including underrepresented ones like Bangladesh, Zimbabwe, and Peru.<n>We construct CulturalBench using methods inspired by Human-AI Red-Teaming.
arXiv Detail & Related papers (2024-10-03T17:04:31Z)
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models [67.38144169029617]
We map the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 Large Language Models (LLMs)<n>With PRISM, we contribute (i) wider geographic and demographic participation in feedback; (ii) census-representative samples for two countries (UK, US); and (iii) individualised ratings that link to detailed participant profiles, permitting personalisation and attribution of sample artefacts.<n>We use PRISM in three case studies to demonstrate the need for careful consideration of which humans provide what alignment data.
arXiv Detail & Related papers (2024-04-24T17:51:36Z)
IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces [28.21857463550941]
We introduce IndoCulture, aimed at understanding the influence of geographical factors on language model reasoning ability. We ask local people to manually develop a cultural context and plausible options, across a set of predefined topics. Open-weight Llama-3 is competitive with GPT-4, while other open-weight models struggle, with accuracies below 50%.
arXiv Detail & Related papers (2024-04-02T11:32:58Z)
Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning [49.04866469947569]
We construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models' ability to understand cultural and geo-location-specific commonsense. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region.
arXiv Detail & Related papers (2021-09-14T17:52:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.