Related papers: Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge

Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge

URL: http://arxiv.org/abs/2511.00657v1
Date: Sat, 01 Nov 2025 18:41:34 GMT
Title: Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge
Authors: Eshaan Tanwar, Anwoy Chatterjee, Michael Saxon, Alon Albalak, William Yang Wang, Tanmoy Chakraborty,
Abstract summary: Most multilingual question-answering benchmarks do not factor in regional diversity in the information they capture.<n>XNationQA encompasses a total of 49,280 questions on the geography, culture, and history of nine countries, presented in seven languages.<n>We benchmark eight standard multilingual LLMs on XNationQA and evaluate them using two novel transference metrics.
Score: 68.6805229085352
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most multilingual question-answering benchmarks, while covering a diverse pool of languages, do not factor in regional diversity in the information they capture and tend to be Western-centric. This introduces a significant gap in fairly evaluating multilingual models' comprehension of factual information from diverse geographical locations. To address this, we introduce XNationQA for investigating the cultural literacy of multilingual LLMs. XNationQA encompasses a total of 49,280 questions on the geography, culture, and history of nine countries, presented in seven languages. We benchmark eight standard multilingual LLMs on XNationQA and evaluate them using two novel transference metrics. Our analyses uncover a considerable discrepancy in the models' accessibility to culturally specific facts across languages. Notably, we often find that a model demonstrates greater knowledge of cultural information in English than in the dominant language of the respective culture. The models exhibit better performance in Western languages, although this does not necessarily translate to being more literate for Western countries, which is counterintuitive. Furthermore, we observe that models have a very limited ability to transfer knowledge across languages, particularly evident in open-source models.

Related papers

Grounding Multilingual Multimodal LLMs With Cultural Knowledge [48.95126394270723]
We propose a data-centric approach that grounds MLLMs in cultural knowledge.<n>CulturalGround comprises 22 million high-quality, culturally-rich VQA pairs spanning 42 countries and 39 languages.<n>We train an open-source MLLM CulturalPangea on CulturalGround, interleaving standard multilingual instruction-tuning data to preserve general abilities.
arXiv Detail & Related papers (2025-08-10T16:24:11Z)
MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMs [37.98920430188422]
MAKIEval is an automatic multilingual framework for evaluating cultural awareness in large language models.<n>It automatically identifies cultural entities in model outputs and links them to structured knowledge.<n>We assess 7 LLMs developed from different parts of the world, encompassing both open-source and proprietary systems.
arXiv Detail & Related papers (2025-05-27T19:29:40Z)
Multilingual Prompting for Improving LLM Generation Diversity [17.303344767549337]
Large Language Models (LLMs) are known to lack cultural representation and overall diversity in their generations.<n>We propose multilingual prompting: a prompting method which generates several variations of a base prompt with added cultural and linguistic cues from several cultures.
arXiv Detail & Related papers (2025-05-21T07:59:21Z)
CARE: Multilingual Human Preference Learning for Cultural Awareness [48.760262639641496]
We introduce textbfCARE, a multilingual resource containing 3,490 culturally specific questions and 31.7k responses with human judgments.<n>We demonstrate how a modest amount of high-quality native preferences improves cultural awareness across various LMs.<n>Our analyses reveal that models with stronger initial cultural performance benefit more from alignment.
arXiv Detail & Related papers (2025-04-07T14:57:06Z)
Language Models' Factuality Depends on the Language of Inquiry [36.466186024957075]
We introduce a benchmark of 10,000 country-related facts across 13 languages.<n>We propose three novel metrics: Factual Recall Score, Knowledge Transferability Score, and Cross-Lingual Factual Knowledge Transferability Score.<n>Our results reveal fundamental weaknesses in today's state-of-the-art LMs.
arXiv Detail & Related papers (2025-02-25T08:27:18Z)
Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs [5.8210182389588105]
Large Language Models (LLMs) are becoming increasingly capable across global languages.<n>However, the ability to communicate across languages does not necessarily translate to appropriate cultural representations.<n>We compare two families of models: Google's Gemma models and OpenAI's turbo-series.<n>We find no consistent relationships between language capabilities and cultural alignment.
arXiv Detail & Related papers (2025-02-23T11:02:41Z)
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines [74.25764182510295]
Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English.<n>We introduce World Cuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding.<n>This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points.
arXiv Detail & Related papers (2024-10-16T16:11:49Z)
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark [68.21939124278065]
Culturally-diverse multilingual Visual Question Answering benchmark designed to cover a rich set of languages and cultures. CVQA includes culturally-driven images and questions from across 30 countries on four continents, covering 31 languages with 13 scripts, providing a total of 10k questions. We benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models.
arXiv Detail & Related papers (2024-06-10T01:59:00Z)
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models [65.10456412127405]
MLaKE is a benchmark for the adaptability of knowledge editing methods across five languages.<n>MLaKE aggregates fact chains from Wikipedia across languages and generates questions in both free-form and multiple-choice.<n>We evaluate the multilingual knowledge editing generalization capabilities of existing methods on MLaKE.
arXiv Detail & Related papers (2024-04-07T15:23:28Z)
Multi-lingual and Multi-cultural Figurative Language Understanding [69.47641938200817]
Figurative language permeates human communication, but is relatively understudied in NLP. We create a dataset for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba. Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region. All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data.
arXiv Detail & Related papers (2023-05-25T15:30:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.