Related papers: ALIGN: Word Association Learning for Cross-Cultural Generalization in Large Language Models

ALIGN: Word Association Learning for Cross-Cultural Generalization in Large Language Models

URL: http://arxiv.org/abs/2508.13426v1
Date: Tue, 19 Aug 2025 00:55:20 GMT
Title: ALIGN: Word Association Learning for Cross-Cultural Generalization in Large Language Models
Authors: Chunhua Liu, Kabir Manandhar Shrestha, Sukai Huang,
Abstract summary: It remains a challenge to model and align culture due to limited cultural knowledge.<n>We introduce parameter-efficient fine-tuning on native speakers' free word-association norms.<n>Our work shows that a few million culture-grounded associations can instill value alignment without costly retraining.
Score: 0.8999666725996975
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: As large language models (LLMs) increasingly mediate cross-cultural communication, their behavior still reflects the distributional bias of the languages and viewpoints that are over-represented in their pre-training corpora. Yet, it remains a challenge to model and align culture due to limited cultural knowledge and a lack of exploration into effective learning approaches. We introduce a cost-efficient, cognitively grounded remedy: parameter-efficient fine-tuning on native speakers' free word-association norms, which encode implicit cultural schemas. Leveraging English-US and Mandarin associations from the Small-World-of-Words project, we adapt Llama-3.1-8B and Qwen-2.5-7B via supervised fine-tuning (SFT) and PPO-based preference optimization. SFT boosts held-out association Precision at 5 by 16-20% in English and 43-165% in Mandarin, lifts median concreteness by +0.20, and attains human-level valence and arousal. These lexical gains transfer: on World-Values-Survey questions, fine-tuned models shift answer distributions toward the target culture, and on a 50-item high-tension subset, Qwen's Chinese-aligned responses double while Llama's US bias drops by one-third. Our 7-8B models rival or beat vanilla 70B baselines, showing that a few million culture-grounded associations can instill value alignment without costly retraining. Our work highlights both the promise and the need for future research grounded in human cognition in improving cultural alignment in AI models.

Related papers

LLMs and Cultural Values: the Impact of Prompt Language and Explicit Cultural Framing [0.21485350418225244]
Large Language Models (LLMs) are rapidly being adopted by users across the globe, who interact with them in a diverse range of languages.<n>We examine how prompt language and cultural framing influence model responses and their alignment with human values in different countries.
arXiv Detail & Related papers (2025-11-06T02:09:29Z)
I Am Aligned, But With Whom? MENA Values Benchmark for Evaluating Cultural Alignment and Multilingual Bias in LLMs [5.060243371992739]
We introduce MENAValues, a novel benchmark designed to evaluate the cultural alignment and multilingual biases of large language models (LLMs)<n> Drawing from large-scale, authoritative human surveys, we curate a structured dataset that captures the sociocultural landscape of MENA with population-level response distributions from 16 countries.<n>Our analysis reveals three critical phenomena: "Cross-Lingual Value Shifts" where identical questions yield drastically different responses based on language, "Reasoning-Induced Degradation" where prompting models to explain their reasoning worsens cultural alignment, and "Logit Leakage" where models refuse sensitive questions while internal probabilities reveal strong hidden
arXiv Detail & Related papers (2025-10-15T05:10:57Z)
MMA-ASIA: A Multilingual and Multimodal Alignment Framework for Culturally-Grounded Evaluation [91.22008265721952]
MMA-ASIA centers on a human-curated, multilingual, and multimodally aligned benchmark covering 8 Asian countries and 10 languages.<n>This is the first dataset aligned at the input level across three modalities: text, image (visual question answering), and speech.<n>We propose a five-dimensional evaluation protocol that measures: (i) cultural-awareness disparities across countries, (ii) cross-lingual consistency, (iii) cross-modal consistency, (iv) cultural knowledge generalization, and (v) grounding validity.
arXiv Detail & Related papers (2025-10-07T14:12:12Z)
From Word to World: Evaluate and Mitigate Culture Bias in LLMs via Word Association Test [50.51344198689069]
We extend the human-centered word association test (WAT) to assess the alignment of large language models with cross-cultural cognition.<n>To address culture preference, we propose CultureSteer, an innovative approach by embedding cultural-specific semantic associations directly within the model's internal representation space.
arXiv Detail & Related papers (2025-05-24T07:05:10Z)
CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization [50.90288681622152]
Large Language Models (LLMs) more deeply integrate into human life across various regions.<n>Existing approaches develop culturally aligned LLMs through fine-tuning with culture-specific corpora.<n>We introduce CAReDiO, a novel cultural data construction framework.
arXiv Detail & Related papers (2025-04-09T13:40:13Z)
CARE: Multilingual Human Preference Learning for Cultural Awareness [48.760262639641496]
We introduce textbfCARE, a multilingual resource containing 3,490 culturally specific questions and 31.7k responses with human judgments.<n>We demonstrate how a modest amount of high-quality native preferences improves cultural awareness across various LMs.<n>Our analyses reveal that models with stronger initial cultural performance benefit more from alignment.
arXiv Detail & Related papers (2025-04-07T14:57:06Z)
Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-Bench [37.63947763066401]
We introduce CQ-Bench, a benchmark designed to assess large language models' capability to infer implicit cultural values.<n>We generate a multi-character conversation-based stories dataset using values from the World Value Survey and GlobalOpinions datasets.<n>We find that while o1 and Deepseek-R1 models reach human-level performance in value selection, they still fall short in nuanced attitude detection.<n>In the value extraction task, GPT-4o-mini and o3-mini score 0.602 and 0.598, highlighting the difficulty of open-ended cultural reasoning.
arXiv Detail & Related papers (2025-04-01T18:54:47Z)
Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs [2.5212698425008377]
Large Language Models (LLMs) are becoming increasingly capable across global languages.<n>However, the ability to communicate across languages does not necessarily translate to appropriate cultural representations.<n>We compare two families of models: Google's Gemma models and OpenAI's turbo-series.<n>We find no consistent relationships between language capabilities and cultural alignment.
arXiv Detail & Related papers (2025-02-23T11:02:41Z)
CulturePark: Boosting Cross-cultural Understanding in Large Language Models [63.452948673344395]
This paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection. It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs. We evaluate these models across three downstream tasks: content moderation, cultural alignment, and cultural education.
arXiv Detail & Related papers (2024-05-24T01:49:02Z)
CultureLLM: Incorporating Cultural Differences into Large Language Models [36.66184989869121]
CultureLLM is a cost-effective solution to incorporate cultural differences into large language models.<n>We fine-tune culture-specific LLMs and one unified model (CultureLLM-One) for 9 cultures covering rich and low-resource languages.<n>Our human study shows that the generated samples are semantically equivalent to the original samples.
arXiv Detail & Related papers (2024-02-09T04:02:43Z)
Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models [89.94270049334479]
This paper identifies a cultural dominance issue within large language models (LLMs) LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages.
arXiv Detail & Related papers (2023-10-19T05:38:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.