Related papers: The Curious Case of Curiosity across Human Cultures and LLMs

The Curious Case of Curiosity across Human Cultures and LLMs

URL: http://arxiv.org/abs/2510.12943v2
Date: Mon, 20 Oct 2025 15:55:28 GMT
Title: The Curious Case of Curiosity across Human Cultures and LLMs
Authors: Angana Borah, Zhijing Jin, Rada Mihalcea,
Abstract summary: We investigate cultural variation in curiosity using Yahoo! Answers, a real-world multi-country dataset spanning diverse topics.<n>We find that Large Language Models flatten cross-cultural diversity, aligning more closely with how curiosity is expressed in Western countries.<n>We then explore fine-tuning strategies to induce curiosity in LLMs, narrowing the human-model alignment gap by up to 50%.
Score: 45.37389175832353
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in Large Language Models (LLMs) have expanded their role in human interaction, yet curiosity -- a central driver of inquiry -- remains underexplored in these systems, particularly across cultural contexts. In this work, we investigate cultural variation in curiosity using Yahoo! Answers, a real-world multi-country dataset spanning diverse topics. We introduce CUEST (CUriosity Evaluation across SocieTies), an evaluation framework that measures human-model alignment in curiosity through linguistic (style), topic preference (content) analysis and grounding insights in social science constructs. Across open- and closed-source models, we find that LLMs flatten cross-cultural diversity, aligning more closely with how curiosity is expressed in Western countries. We then explore fine-tuning strategies to induce curiosity in LLMs, narrowing the human-model alignment gap by up to 50%. Finally, we demonstrate the practical value of curiosity for LLM adaptability across cultures, showing its importance for future NLP research.

Related papers

LLMs and Cultural Values: the Impact of Prompt Language and Explicit Cultural Framing [0.21485350418225244]
Large Language Models (LLMs) are rapidly being adopted by users across the globe, who interact with them in a diverse range of languages.<n>We examine how prompt language and cultural framing influence model responses and their alignment with human values in different countries.
arXiv Detail & Related papers (2025-11-06T02:09:29Z)
Why Did Apple Fall To The Ground: Evaluating Curiosity In Large Language Model [67.37154331548413]
We design a comprehensive evaluation framework to assess the extent of curiosity exhibited by large language models (LLMs)<n>The results demonstrate that LLMs exhibit a stronger thirst for knowledge than humans but still tend to make conservative choices when faced with uncertain environments.<n>These findings suggest that LLMs have the potential to exhibit curiosity similar to that of humans, providing experimental support for the future development of learning capabilities.
arXiv Detail & Related papers (2025-10-23T15:05:17Z)
I Am Aligned, But With Whom? MENA Values Benchmark for Evaluating Cultural Alignment and Multilingual Bias in LLMs [5.060243371992739]
We introduce MENAValues, a novel benchmark designed to evaluate the cultural alignment and multilingual biases of large language models (LLMs)<n> Drawing from large-scale, authoritative human surveys, we curate a structured dataset that captures the sociocultural landscape of MENA with population-level response distributions from 16 countries.<n>Our analysis reveals three critical phenomena: "Cross-Lingual Value Shifts" where identical questions yield drastically different responses based on language, "Reasoning-Induced Degradation" where prompting models to explain their reasoning worsens cultural alignment, and "Logit Leakage" where models refuse sensitive questions while internal probabilities reveal strong hidden
arXiv Detail & Related papers (2025-10-15T05:10:57Z)
MMA-ASIA: A Multilingual and Multimodal Alignment Framework for Culturally-Grounded Evaluation [91.22008265721952]
MMA-ASIA centers on a human-curated, multilingual, and multimodally aligned benchmark covering 8 Asian countries and 10 languages.<n>This is the first dataset aligned at the input level across three modalities: text, image (visual question answering), and speech.<n>We propose a five-dimensional evaluation protocol that measures: (i) cultural-awareness disparities across countries, (ii) cross-lingual consistency, (iii) cross-modal consistency, (iv) cultural knowledge generalization, and (v) grounding validity.
arXiv Detail & Related papers (2025-10-07T14:12:12Z)
From Word to World: Evaluate and Mitigate Culture Bias in LLMs via Word Association Test [50.51344198689069]
We extend the human-centered word association test (WAT) to assess the alignment of large language models with cross-cultural cognition.<n>To address culture preference, we propose CultureSteer, an innovative approach by embedding cultural-specific semantic associations directly within the model's internal representation space.
arXiv Detail & Related papers (2025-05-24T07:05:10Z)
From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs [62.9861554207279]
Adapting cultural values in Large Language Models (LLMs) presents significant challenges.<n>Prior work primarily aligns LLMs with different cultural values using World Values Survey (WVS) data.<n>We investigate WVS-based training for cultural value adaptation and find that relying solely on survey data cane cultural norms and interfere with factual knowledge.
arXiv Detail & Related papers (2025-05-22T09:00:01Z)
Through the Prism of Culture: Evaluating LLMs' Understanding of Indian Subcultures and Traditions [9.331687165284587]
We evaluate the capacity of Large Language Models to recognize and accurately respond to the Little Traditions within Indian society.<n>Through a series of case studies, we assess whether LLMs can balance the interplay between dominant Great Traditions and localized Little Traditions.<n>Our findings reveal that while LLMs demonstrate an ability to articulate cultural nuances, they often struggle to apply this understanding in practical, context-specific scenarios.
arXiv Detail & Related papers (2025-01-28T06:58:25Z)
Exploring Large Language Models on Cross-Cultural Values in Connection with Training Methodology [4.079147243688765]
Large language models (LLMs) closely interact with humans, and need an intimate understanding of the cultural values of human society.<n>Our analysis shows that LLMs can judge socio-cultural norms similar to humans but less so on social systems and progress.<n>Increasing model size helps a better understanding of social values, but smaller models can be enhanced by using synthetic data.
arXiv Detail & Related papers (2024-12-12T00:52:11Z)
Survey of Cultural Awareness in Language Models: Text and Beyond [39.77033652289063]
Large-scale deployment of large language models (LLMs) in various applications requires LLMs to be culturally sensitive to the user to ensure inclusivity. Culture has been widely studied in psychology and anthropology, and there has been a recent surge in research on making LLMs more culturally inclusive.
arXiv Detail & Related papers (2024-10-30T16:37:50Z)
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models [59.22460740026037]
"CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset is designed to evaluate the social and cultural variation of Large Language Models (LLMs) We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, social welfare, immigration, disability rights, and surrogacy.
arXiv Detail & Related papers (2024-05-22T20:19:10Z)
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge [69.82940934994333]
We introduce CulturalTeaming, an interactive red-teaming system that leverages human-AI collaboration to build challenging evaluation dataset. Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions. CULTURALBENCH-V0.1 is a compact yet high-quality evaluation dataset with users' red-teaming attempts.
arXiv Detail & Related papers (2024-04-10T00:25:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.