Fluent but Foreign: Even Regional LLMs Lack Cultural Alignment
- URL: http://arxiv.org/abs/2505.21548v2
- Date: Sun, 21 Sep 2025 19:54:06 GMT
- Title: Fluent but Foreign: Even Regional LLMs Lack Cultural Alignment
- Authors: Dhruv Agarwal, Anya Shukla, Sunayana Sitaram, Aditya Vashistha,
- Abstract summary: Large language models (LLMs) are used worldwide, yet exhibit Western cultural tendencies.<n>We evaluate six Indic and six global LLMs on two dimensions -- values and practices.<n>Across tasks, Indic models do not align better with Indian norms than global models.
- Score: 24.871503011248777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are used worldwide, yet exhibit Western cultural tendencies. Many countries are now building ``regional'' LLMs, but it remains unclear whether they reflect local values and practices or merely speak local languages. Using India as a case study, we evaluate six Indic and six global LLMs on two dimensions -- values and practices -- grounded in nationally representative surveys and community-sourced QA datasets. Across tasks, Indic models do not align better with Indian norms than global models; in fact, a U.S. respondent is a closer proxy for Indian values than any Indic model. Prompting and regional fine-tuning fail to recover alignment and can even degrade existing knowledge. We attribute this to scarce culturally grounded data, especially for pretraining. We position cultural evaluation as a first-class requirement alongside multilingual benchmarks and offer a reusable, community-grounded methodology. We call for native, community-authored corpora and thick x wide evaluations to build truly sovereign LLMs.
Related papers
- Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge [68.6805229085352]
Most multilingual question-answering benchmarks do not factor in regional diversity in the information they capture.<n>XNationQA encompasses a total of 49,280 questions on the geography, culture, and history of nine countries, presented in seven languages.<n>We benchmark eight standard multilingual LLMs on XNationQA and evaluate them using two novel transference metrics.
arXiv Detail & Related papers (2025-11-01T18:41:34Z) - Which Cultural Lens Do Models Adopt? On Cultural Positioning Bias and Agentic Mitigation in LLMs [53.07843733899881]
Large language models (LLMs) have unlocked a wide range of downstream generative applications.<n>We find that they also risk perpetuating subtle fairness issues tied to culture, positioning their generations from the perspectives of the mainstream US culture.<n>We propose 2 inference-time mitigation methods to resolve these biases.
arXiv Detail & Related papers (2025-09-25T12:28:25Z) - DIWALI - Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context [7.582991335459645]
Large language models (LLMs) are widely used in various tasks and applications.<n>They are shown to lack cultural alignment due to a lack of cultural knowledge and competence.<n>We introduce a novel CSI dataset for Indian culture, belonging to 17 cultural facets.
arXiv Detail & Related papers (2025-09-22T06:58:02Z) - CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs [57.653830744706305]
CultureScope is the most comprehensive evaluation framework to date for assessing cultural understanding in large language models.<n>Inspired by the cultural iceberg theory, we design a novel dimensional schema for cultural knowledge classification.<n> Experimental results demonstrate that our method can effectively evaluate cultural understanding.
arXiv Detail & Related papers (2025-09-19T17:47:48Z) - From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs [57.43233760384488]
Adapting cultural values in Large Language Models (LLMs) presents significant challenges.<n>Prior work primarily aligns LLMs with different cultural values using World Values Survey (WVS) data.<n>In this paper, we investigate WVS-based training for cultural value adaptation and find that relying solely on survey data cane cultural norms and interfere with factual knowledge.
arXiv Detail & Related papers (2025-05-22T09:00:01Z) - An Evaluation of Cultural Value Alignment in LLM [27.437888319382893]
We conduct the first large-scale evaluation of LLM culture assessing 20 countries' cultures and languages across ten LLMs.<n>Our findings show that the output over all models represents a moderate cultural middle ground.<n> Deeper investigation sheds light on the influence of model origin, prompt language, and value dimensions on cultural output.
arXiv Detail & Related papers (2025-04-11T09:13:19Z) - CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization [50.90288681622152]
Large Language Models (LLMs) more deeply integrate into human life across various regions.<n>Existing approaches develop culturally aligned LLMs through fine-tuning with culture-specific corpora.<n>We introduce CAReDiO, a novel cultural data construction framework.
arXiv Detail & Related papers (2025-04-09T13:40:13Z) - Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs [5.8210182389588105]
Large Language Models (LLMs) are becoming increasingly capable across global languages.<n>However, the ability to communicate across languages does not necessarily translate to appropriate cultural representations.<n>We compare two families of models: Google's Gemma models and OpenAI's turbo-series.<n>We find no consistent relationships between language capabilities and cultural alignment.
arXiv Detail & Related papers (2025-02-23T11:02:41Z) - LIBRA: Measuring Bias of Large Language Model from a Local Context [9.612845616659776]
Large Language Models (LLMs) have significantly advanced natural language processing applications.<n>Yet their widespread use raises concerns regarding inherent biases that may reduce utility or harm for particular social groups.<n>This research addresses these limitations with a Local Integrated Bias Recognition and Assessment Framework (LIBRA) for measuring bias.
arXiv Detail & Related papers (2025-02-02T04:24:57Z) - CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries [63.00147630084146]
Vision-language models (VLMs) have advanced human-AI interaction but struggle with cultural understanding.<n>CultureVerse is a large-scale multimodal benchmark covering 19, 682 cultural concepts, 188 countries/regions, 15 cultural concepts, and 3 question types.<n>We propose CultureVLM, a series of VLMs fine-tuned on our dataset to achieve significant performance improvement in cultural understanding.
arXiv Detail & Related papers (2025-01-02T14:42:37Z) - Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation [71.59208664920452]
Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks.<n>We show that progress on MMLU depends heavily on learning Western-centric concepts, with 28% of all questions requiring culturally sensitive knowledge.<n>We release Global MMLU, an improved MMLU with evaluation coverage across 42 languages.
arXiv Detail & Related papers (2024-12-04T13:27:09Z) - Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models [4.771099208181585]
LLMs are increasingly deployed in global applications, ensuring users from diverse backgrounds feel respected and understood.<n>Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values.<n>We present two key contributions: A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and a culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators.
arXiv Detail & Related papers (2024-10-15T18:13:10Z) - Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning [13.034603322224548]
We present a simple and inexpensive method that uses a combination of in-context learning (ICL) and human survey data.
We show that our method could prove useful in test languages other than English and can improve alignment to the cultural values that correspond to a range of culturally diverse countries.
arXiv Detail & Related papers (2024-08-29T12:18:04Z) - BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages [39.17279399722437]
Large language models (LLMs) often lack culture-specific knowledge of daily life, especially across diverse regions and non-English languages.<n>We introduce BLEnD, a hand-crafted benchmark designed to evaluate LLMs' everyday knowledge across diverse cultures and languages.<n>We construct the benchmark to include two formats of questions: short-answer and multiple-choice.
arXiv Detail & Related papers (2024-06-14T11:48:54Z) - CulturePark: Boosting Cross-cultural Understanding in Large Language Models [63.452948673344395]
This paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection.
It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs.
We evaluate these models across three downstream tasks: content moderation, cultural alignment, and cultural education.
arXiv Detail & Related papers (2024-05-24T01:49:02Z) - NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models [26.64843536942309]
Large language models (LLMs) may need to adapt outputs to user values and cultures, not just know about them.<n>We introduce NormAd, an evaluation framework to assess LLMs' cultural adaptability.<n>We create NormAd-Eti, a benchmark of 2.6k situational descriptions representing social-etiquette related cultural norms from 75 countries.
arXiv Detail & Related papers (2024-04-18T18:48:50Z) - CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting [73.94059188347582]
We uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations.
We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures.
arXiv Detail & Related papers (2024-04-16T00:50:43Z) - Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in
Large Language Models [89.94270049334479]
This paper identifies a cultural dominance issue within large language models (LLMs)
LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages.
arXiv Detail & Related papers (2023-10-19T05:38:23Z) - Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions [10.415002561977655]
This research proposes a Cultural Alignment Test (Hoftede's CAT) to quantify cultural alignment using Hofstede's cultural dimension framework.
We quantitatively evaluate large language models (LLMs) against the cultural dimensions of regions like the United States, China, and Arab countries.
Our results quantify the cultural alignment of LLMs and reveal the difference between LLMs in explanatory cultural dimensions.
arXiv Detail & Related papers (2023-08-25T14:50:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.