Related papers: Global Patterns of Knowledge: Language, Genre, and the Geography of Knowledge

Global Patterns of Knowledge: Language, Genre, and the Geography of Knowledge

URL: http://arxiv.org/abs/2507.22271v1
Date: Tue, 29 Jul 2025 22:43:01 GMT
Title: Global Patterns of Knowledge: Language, Genre, and the Geography of Knowledge
Authors: Akira Matsui, Fujio Toriumi, Mitsuo Yoshida, Taichi Murayama, Shiori Hironaka,
Abstract summary: We use economic complexity analysis to understand the editing history of Wikipedia platforms.<n>We reveal that different language communities exhibit distinct specializations, particularly in cultural subjects.<n>Our findings suggest that while a common mode of knowledge production exists for standardized topics such as science, it is more diverse for cultural topics or controversial subjects such as conspiracy theories.
Score: 0.45666156207236525
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Online platforms, particularly Wikipedia, have become critical infrastructures for providing diverse linguistic and cultural contexts. This human-curated knowledge now forms the foundation for modern AI. However, we have not yet fully explored how knowledge production capability vary across languages and domains. Here, we address this gap by applying economic complexity analysis to understand the editing history of Wikipedia platforms. This approach allows us to infer the latent mode of ``knowledge-production'' of each language community from the diversity and specialization of its contributed content. We reveal that different language communities exhibit distinct specializations, particularly in cultural subjects. Furthermore, we map the global landscape of these production modes, finding that the structure of knowledge production strongly reflects geopolitical boundaries. Our findings suggest that while a common mode of knowledge production exists for standardized topics such as science, it is more diverse for cultural topics or controversial subjects such as conspiracy theories. The association between differences in knowledge production capability and geopolitical factors implies how linguistic and cultural dynamics shape our worldview and the biases embedded in Wikipedia data, a unique, massive, and essential dataset for modern AI.

Related papers

Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge [68.6805229085352]
Most multilingual question-answering benchmarks do not factor in regional diversity in the information they capture.<n>XNationQA encompasses a total of 49,280 questions on the geography, culture, and history of nine countries, presented in seven languages.<n>We benchmark eight standard multilingual LLMs on XNationQA and evaluate them using two novel transference metrics.
arXiv Detail & Related papers (2025-11-01T18:41:34Z)
CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs [57.653830744706305]
CultureScope is the most comprehensive evaluation framework to date for assessing cultural understanding in large language models.<n>Inspired by the cultural iceberg theory, we design a novel dimensional schema for cultural knowledge classification.<n> Experimental results demonstrate that our method can effectively evaluate cultural understanding.
arXiv Detail & Related papers (2025-09-19T17:47:48Z)
Animer une base de connaissance: des ontologies aux mod{è}les d'I.A. g{é}n{é}rative [0.0]
Article proposes a reading of the hybridization between symbolic AI and neural (or sub-symbolic) AI based on a field of application.<n>We describe the LaCAS ecosystem -- Open Archives in Linguistic and Cultural Studies.<n>We illustrate this approach using the knowledge domain ''Languages of the world'' (540 languages) and the knowledge object ''Quechua (language)''
arXiv Detail & Related papers (2025-09-01T09:40:55Z)
A Community-driven vision for a new Knowledge Resource for AI [59.29703403953085]
Despite the success of knowledge resources like WordNet, verifiable, general-purpose widely available sources of knowledge remain a critical deficiency in AI infrastructure.<n>This paper synthesizes our findings and outlines a community-driven vision for a new knowledge infrastructure.
arXiv Detail & Related papers (2025-06-19T20:51:28Z)
MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMs [26.806566827956875]
MAKIEval is an automatic multilingual framework for evaluating cultural awareness in large language models.<n>It automatically identifies cultural entities in model outputs and links them to structured knowledge.<n>We assess 7 LLMs developed from different parts of the world, encompassing both open-source and proprietary systems.
arXiv Detail & Related papers (2025-05-27T19:29:40Z)
CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis [41.261808170896686]
CulFiT is a novel training paradigm that leverages multilingual data and fine-grained reward modeling to enhance cultural sensitivity and inclusivity.<n>Our approach synthesizes diverse cultural-related questions, constructs critique data in culturally relevant languages, and employs fine-grained rewards to decompose cultural texts into verifiable knowledge units.
arXiv Detail & Related papers (2025-05-26T04:08:26Z)
From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs [57.43233760384488]
Adapting cultural values in Large Language Models (LLMs) presents significant challenges.<n>Prior work primarily aligns LLMs with different cultural values using World Values Survey (WVS) data.<n>In this paper, we investigate WVS-based training for cultural value adaptation and find that relying solely on survey data cane cultural norms and interfere with factual knowledge.
arXiv Detail & Related papers (2025-05-22T09:00:01Z)
Language Specific Knowledge: Do Models Know Better in X than in English? [9.923619418000488]
Code-switching is a common phenomenon of alternating between different languages in the same utterance, thought, or conversation.<n>We coin the term Language Specific Knowledge (LSK) to represent this phenomenon.<n>We find that language models can perform better when using chain-of-thought reasoning in some languages other than English.
arXiv Detail & Related papers (2025-05-21T00:31:13Z)
Risks of Cultural Erasure in Large Language Models [4.613949381428196]
We argue for the need of metricizable evaluations of language technologies that interrogate and account for historical power inequities.<n>We probe representations that a language model produces about different places around the world when asked to describe these contexts.<n>We analyze the cultures represented in the travel recommendations produced by a set of language model applications.
arXiv Detail & Related papers (2025-01-02T04:57:50Z)
Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks. We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts. We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z)
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models [59.22460740026037]
"CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset is designed to evaluate the social and cultural variation of Large Language Models (LLMs) We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, social welfare, immigration, disability rights, and surrogacy.
arXiv Detail & Related papers (2024-05-22T20:19:10Z)
Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models [51.891804790725686]
Elements of World Knowledge (EWoK) is a framework for evaluating language models' understanding of conceptual knowledge underlying world modeling.<n>EWoK-core-1.0 is a dataset of 4,374 items covering 11 world knowledge domains.<n>All tested models perform worse than humans, with results varying drastically across domains.
arXiv Detail & Related papers (2024-05-15T17:19:42Z)
Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition. Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages. Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.