This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models
- URL: http://arxiv.org/abs/2305.14610v4
- Date: Tue, 2 Apr 2024 02:55:33 GMT
- Title: This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models
- Authors: Bryan Li, Samar Haider, Chris Callison-Burch,
- Abstract summary: We show that large language models (LLM) recall certain geographical knowledge inconsistently when queried in different languages.
As a targeted case study, we consider territorial disputes, an inherently controversial and multilingual task.
We propose a suite of evaluation metrics to precisely quantify bias and consistency in responses across different languages.
- Score: 40.61046400448044
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Do the Spratly Islands belong to China, the Philippines, or Vietnam? A pretrained large language model (LLM) may answer differently if asked in the languages of each claimant country: Chinese, Tagalog, or Vietnamese. This contrasts with a multilingual human, who would likely answer consistently. In this paper, we show that LLMs recall certain geographical knowledge inconsistently when queried in different languages -- a phenomenon we term geopolitical bias. As a targeted case study, we consider territorial disputes, an inherently controversial and multilingual task. We introduce BorderLines, a dataset of territorial disputes which covers 251 territories, each associated with a set of multiple-choice questions in the languages of each claimant country (49 languages in total). We also propose a suite of evaluation metrics to precisely quantify bias and consistency in responses across different languages. We then evaluate various multilingual LLMs on our dataset and metrics to probe their internal knowledge and use the proposed metrics to discover numerous inconsistencies in how these models respond in different languages. Finally, we explore several prompt modification strategies, aiming to either amplify or mitigate geopolitical bias, which highlights how brittle LLMs are and how they tailor their responses depending on cues from the interaction context. Our code and data are available at https://github.com/manestay/borderlines
Related papers
- Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs [31.893686987768742]
Language models are inconsistent in their ability to answer the same factual question across languages.
We explore multilingual factual knowledge through two aspects: the model's ability to answer a query consistently across languages, and the ability to ''store'' answers in a shared representation for several languages.
arXiv Detail & Related papers (2024-08-20T08:38:30Z) - Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP.
Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions.
We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ [16.637598165238934]
Large language models (LLMs) need to serve everyone, including a global majority of non-English speakers.
Recent research shows that, despite limits in their intended use, people prompt LLMs in many different languages.
We introduce MultiQ, a new silver standard benchmark for basic open-ended question answering with 27.4k test questions.
arXiv Detail & Related papers (2024-03-06T16:01:44Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Towards Measuring the Representation of Subjective Global Opinions in Language Models [26.999751306332165]
Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues.
We develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to.
We release our dataset for others to use and build on.
arXiv Detail & Related papers (2023-06-28T17:31:53Z) - GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained
Language Models [68.50584946761813]
We introduce a framework for geo-diverse commonsense probing on multilingual Language Models (mPLMs)
We benchmark 11 standard mPLMs which include variants of mBERT, XLM, mT5, and XGLM on GeoMLAMA dataset.
We find that 1) larger mPLM variants do not necessarily store geo-diverse concepts better than its smaller variant; 2) mPLMs are not intrinsically biased towards knowledge from the Western countries; and 3) a language may better probe knowledge about a non-native country than its native country.
arXiv Detail & Related papers (2022-05-24T17:54:50Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.