Related papers: The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities

The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities

URL: http://arxiv.org/abs/2508.05525v1
Date: Thu, 07 Aug 2025 15:53:30 GMT
Title: The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities
Authors: Harsh Nishant Lalai, Raj Sanjay Shah, Jiaxin Pei, Sashank Varma, Yi-Chia Wang, Ali Emami,
Abstract summary: Large Language Models (LLMs) have been extensively tuned to mitigate explicit biases, yet they often exhibit subtle implicit biases rooted in their pre-training data.<n>We propose studying how models behave when they proactively ask questions themselves.<n>The 20 Questions game, a multi-turn deduction task, serves as an ideal testbed for this purpose.
Score: 12.46765303763981
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) have been extensively tuned to mitigate explicit biases, yet they often exhibit subtle implicit biases rooted in their pre-training data. Rather than directly probing LLMs with human-crafted questions that may trigger guardrails, we propose studying how models behave when they proactively ask questions themselves. The 20 Questions game, a multi-turn deduction task, serves as an ideal testbed for this purpose. We systematically evaluate geographic performance disparities in entity deduction using a new dataset, Geo20Q+, consisting of both notable people and culturally significant objects (e.g., foods, landmarks, animals) from diverse regions. We test popular LLMs across two gameplay configurations (canonical 20-question and unlimited turns) and in seven languages (English, Hindi, Mandarin, Japanese, French, Spanish, and Turkish). Our results reveal geographic disparities: LLMs are substantially more successful at deducing entities from the Global North than the Global South, and the Global West than the Global East. While Wikipedia pageviews and pre-training corpus frequency correlate mildly with performance, they fail to fully explain these disparities. Notably, the language in which the game is played has minimal impact on performance gaps. These findings demonstrate the value of creative, free-form evaluation frameworks for uncovering subtle biases in LLMs that remain hidden in standard prompting setups. By analyzing how models initiate and pursue reasoning goals over multiple turns, we find geographic and cultural disparities embedded in their reasoning processes. We release the dataset (Geo20Q+) and code at https://sites.google.com/view/llmbias20q/home.

Related papers

Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks.<n>We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs.<n>These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z)
ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models [75.05436691700572]
We introduce ExpliCa, a new dataset for evaluating Large Language Models (LLMs) in explicit causal reasoning.<n>We tested seven commercial and open-source LLMs on ExpliCa through prompting and perplexity-based metrics.<n>Surprisingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events.
arXiv Detail & Related papers (2025-02-21T14:23:14Z)
LIBRA: Measuring Bias of Large Language Model from a Local Context [9.612845616659776]
Large Language Models (LLMs) have significantly advanced natural language processing applications.<n>Yet their widespread use raises concerns regarding inherent biases that may reduce utility or harm for particular social groups.<n>This research addresses these limitations with a Local Integrated Bias Recognition and Assessment Framework (LIBRA) for measuring bias.
arXiv Detail & Related papers (2025-02-02T04:24:57Z)
Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
Hate Personified: Investigating the role of LLMs in content moderation [64.26243779985393]
For subjective tasks such as hate detection, where people perceive hate differently, the Large Language Model's (LLM) ability to represent diverse groups is unclear. By including additional context in prompts, we analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected.
arXiv Detail & Related papers (2024-10-03T16:43:17Z)
Large Language Models are Geographically Biased [47.88767211956144]
We study what Large Language Models (LLMs) know about the world we live in through the lens of geography. We show various problematic geographic biases, which we define as systemic errors in geospatial predictions.
arXiv Detail & Related papers (2024-02-05T02:32:09Z)
Geographical Erasure in Language Generation [13.219867587151986]
We study and operationalise a form of geographical erasure, wherein language models underpredict certain countries. We discover that erasure strongly correlates with low frequencies of country mentions in the training corpus. We mitigate erasure by finetuning using a custom objective.
arXiv Detail & Related papers (2023-10-23T10:26:14Z)
Can Large Language Models Infer Causation from Correlation? [104.96351414570239]
We test the pure causal inference skills of large language models (LLMs) We formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables. We show that these models achieve almost close to random performance on the task.
arXiv Detail & Related papers (2023-06-09T12:09:15Z)
This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models [40.61046400448044]
We show that large language models (LLM) recall certain geographical knowledge inconsistently when queried in different languages. As a targeted case study, we consider territorial disputes, an inherently controversial and multilingual task. We propose a suite of evaluation metrics to precisely quantify bias and consistency in responses across different languages.
arXiv Detail & Related papers (2023-05-24T01:16:17Z)
Event knowledge in large language models: the gap between the impossible and the unlikely [46.540380831486125]
We show that pre-trained large language models (LLMs) possess substantial event knowledge. They almost always assign higher likelihood to possible vs. impossible events. However, they show less consistent preferences for likely vs. unlikely events.
arXiv Detail & Related papers (2022-12-02T23:43:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.