Culture Cartography: Mapping the Landscape of Cultural Knowledge
- URL: http://arxiv.org/abs/2510.27672v1
- Date: Fri, 31 Oct 2025 17:37:34 GMT
- Title: Culture Cartography: Mapping the Landscape of Cultural Knowledge
- Authors: Caleb Ziems, William Held, Jane Yu, Amir Goldberg, David Grusky, Diyi Yang,
- Abstract summary: To serve global users safely and productively, LLMs need culture-specific knowledge that might not be learned during pre-training.<n>We propose a mixed-initiative methodology called CultureCartography.<n>Here, an LLM initializes an annotation with questions for which it has low-confidence answers, making explicit both its prior knowledge and the gaps therein.<n>This allows a human respondent to fill these gaps and steer the model towards salient topics through direct edits.<n>Compared to a baseline where humans answer LLM-proposed questions, we find that CultureExplorer more effectively produces knowledge that leading models like DeepSeek R1 and GPT-4o
- Score: 50.502555170749694
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: To serve global users safely and productively, LLMs need culture-specific knowledge that might not be learned during pre-training. How do we find such knowledge that is (1) salient to in-group users, but (2) unknown to LLMs? The most common solutions are single-initiative: either researchers define challenging questions that users passively answer (traditional annotation), or users actively produce data that researchers structure as benchmarks (knowledge extraction). The process would benefit from mixed-initiative collaboration, where users guide the process to meaningfully reflect their cultures, and LLMs steer the process towards more challenging questions that meet the researcher's goals. We propose a mixed-initiative methodology called CultureCartography. Here, an LLM initializes annotation with questions for which it has low-confidence answers, making explicit both its prior knowledge and the gaps therein. This allows a human respondent to fill these gaps and steer the model towards salient topics through direct edits. We implement this methodology as a tool called CultureExplorer. Compared to a baseline where humans answer LLM-proposed questions, we find that CultureExplorer more effectively produces knowledge that leading models like DeepSeek R1 and GPT-4o are missing, even with web search. Fine-tuning on this data boosts the accuracy of Llama-3.1-8B by up to 19.2% on related culture benchmarks.
Related papers
- CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs [57.653830744706305]
CultureScope is the most comprehensive evaluation framework to date for assessing cultural understanding in large language models.<n>Inspired by the cultural iceberg theory, we design a novel dimensional schema for cultural knowledge classification.<n> Experimental results demonstrate that our method can effectively evaluate cultural understanding.
arXiv Detail & Related papers (2025-09-19T17:47:48Z) - Prompting is not Enough: Exploring Knowledge Integration and Controllable Generation [89.65955788873532]
Open-domain question answering (OpenQA) represents a cornerstone in natural language processing (NLP)<n>We propose a novel framework named GenKI, which aims to improve the OpenQA performance by exploring Knowledge Integration and controllable Generation.
arXiv Detail & Related papers (2025-05-26T08:18:33Z) - From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs [62.9861554207279]
Adapting cultural values in Large Language Models (LLMs) presents significant challenges.<n>Prior work primarily aligns LLMs with different cultural values using World Values Survey (WVS) data.<n>We investigate WVS-based training for cultural value adaptation and find that relying solely on survey data cane cultural norms and interfere with factual knowledge.
arXiv Detail & Related papers (2025-05-22T09:00:01Z) - Towards Geo-Culturally Grounded LLM Generations [16.9281418974003]
Generative large language models (LLMs) have demonstrated gaps in diverse cultural awareness across the globe.<n>We investigate the effect of retrieval augmented generation and search-grounding techniques on LLMs' ability to display familiarity with various national cultures.
arXiv Detail & Related papers (2025-02-19T07:29:58Z) - CulturalBench: A Robust, Diverse, and Challenging Cultural Benchmark by Human-AI CulturalTeaming [75.82306181299153]
CulturalBench is a set of 1,696 human-written and human-verified questions to assess LMs' cultural knowledge.<n>It covers 45 global regions including underrepresented ones like Bangladesh, Zimbabwe, and Peru.<n>We construct CulturalBench using methods inspired by Human-AI Red-Teaming.
arXiv Detail & Related papers (2024-10-03T17:04:31Z) - Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense [98.09670425244462]
Large language models (LLMs) have demonstrated substantial commonsense understanding.
This paper examines the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks.
arXiv Detail & Related papers (2024-05-07T20:28:34Z) - CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge [69.82940934994333]
We introduce CulturalTeaming, an interactive red-teaming system that leverages human-AI collaboration to build challenging evaluation dataset.
Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions.
CULTURALBENCH-V0.1 is a compact yet high-quality evaluation dataset with users' red-teaming attempts.
arXiv Detail & Related papers (2024-04-10T00:25:09Z) - Knowledge-Augmented Large Language Models for Personalized Contextual
Query Suggestion [16.563311988191636]
We construct an entity-centric knowledge store for each user based on their search and browsing activities on the web.
This knowledge store is light-weight, since it only produces user-specific aggregate projections of interests and knowledge onto public knowledge graphs.
arXiv Detail & Related papers (2023-11-10T01:18:47Z) - Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset [0.0]
This paper investigates the potential for the newest version of Large Language Models (LLMs) to be used in short answer questions for formative assessments.
It introduces a novel dataset of short answer reading comprehension questions, drawn from a set of reading assessments conducted with over 150 students in Ghana.
The paper empirically evaluates how well various configurations of generative LLMs grade student short answer responses compared to expert human raters.
arXiv Detail & Related papers (2023-10-26T17:05:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.