Related papers: Geographic and Geopolitical Biases of Language Models

Geographic and Geopolitical Biases of Language Models

URL: http://arxiv.org/abs/2212.10408v1
Date: Tue, 20 Dec 2022 16:32:54 GMT
Title: Geographic and Geopolitical Biases of Language Models
Authors: Fahim Faisal, Antonios Anastasopoulos
Abstract summary: We propose an approach to study the geographic bias (and knowledge) present in pretrained language models (PLMs) Our findings suggest PLMs' representations map surprisingly well to the physical world in terms of country-to-country associations. Last, we explain how large PLMs despite exhibiting notions of geographical proximity, over-amplify geopoliticalitism at inference time.
Score: 43.62238334380897
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pretrained language models (PLMs) often fail to fairly represent target users from certain world regions because of the under-representation of those regions in training datasets. With recent PLMs trained on enormous data sources, quantifying their potential biases is difficult, due to their black-box nature and the sheer scale of the data sources. In this work, we devise an approach to study the geographic bias (and knowledge) present in PLMs, proposing a Geographic-Representation Probing Framework adopting a self-conditioning method coupled with entity-country mappings. Our findings suggest PLMs' representations map surprisingly well to the physical world in terms of country-to-country associations, but this knowledge is unequally shared across languages. Last, we explain how large PLMs despite exhibiting notions of geographical proximity, over-amplify geopolitical favouritism at inference time.

Related papers

From Pixels to Places: A Systematic Benchmark for Evaluating Image Geolocalization Ability in Large Language Models [14.178064117544082]
Image geolocalization is important for applications in crisis response, digital forensics, and location-based intelligence.<n>Recent advances in large language models (LLMs) offer new opportunities for visual reasoning.<n>We introduce a benchmark called IMAGEO-Bench that systematically evaluates accuracy, distance error, geospatial bias, and reasoning process.
arXiv Detail & Related papers (2025-08-03T06:04:33Z)
Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models [52.00270888041742]
We introduce a novel dataset with neutral event descriptions and contrasting viewpoints from different countries.<n>Our findings show significant geopolitical biases, with models favoring specific national narratives.<n>Simple debiasing prompts had a limited effect on reducing these biases.
arXiv Detail & Related papers (2025-06-07T10:45:17Z)
OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence [51.0456395687016]
multimodal large language models (LLMs) have opened new frontiers in artificial intelligence. We propose a MLLM (OmniGeo) tailored to geospatial applications. By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems.
arXiv Detail & Related papers (2025-03-20T16:45:48Z)
Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input [2.516307239032451]
We present a method which represents real-world locations as averaged embeddings from labeled user-input location names. We show that our approach improves geo-entity linking on a global and multilingual social media dataset.
arXiv Detail & Related papers (2024-04-29T15:18:33Z)
Large Language Models are Geographically Biased [47.88767211956144]
We study what Large Language Models (LLMs) know about the world we live in through the lens of geography. We show various problematic geographic biases, which we define as systemic errors in geospatial predictions.
arXiv Detail & Related papers (2024-02-05T02:32:09Z)
Geographical Erasure in Language Generation [13.219867587151986]
We study and operationalise a form of geographical erasure, wherein language models underpredict certain countries. We discover that erasure strongly correlates with low frequencies of country mentions in the training corpus. We mitigate erasure by finetuning using a custom objective.
arXiv Detail & Related papers (2023-10-23T10:26:14Z)
GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding [45.36562604939258]
This paper introduces GeoLM, a language model that enhances the understanding of geo-entities in natural language. We demonstrate that GeoLM exhibits promising capabilities in supporting toponym recognition, toponym linking, relation extraction, and geo-entity typing.
arXiv Detail & Related papers (2023-10-23T01:20:01Z)
GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models. We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods. Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z)
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z)
GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark [56.08664336835741]
We propose a GeoGraphic Language Understanding Evaluation benchmark, named GeoGLUE. We collect data from open-released geographic resources and introduce six natural language understanding tasks. We pro vide evaluation experiments and analysis of general baselines, indicating the effectiveness and significance of the GeoGLUE benchmark.
arXiv Detail & Related papers (2023-05-11T03:21:56Z)
GeoNet: Benchmarking Unsupervised Adaptation across Geographies [71.23141626803287]
We study the problem of geographic robustness and make three main contributions. First, we introduce a large-scale dataset GeoNet for geographic adaptation. Second, we hypothesize that the major source of domain shifts arise from significant variations in scene context. Third, we conduct an extensive evaluation of several state-of-the-art unsupervised domain adaptation algorithms and architectures.
arXiv Detail & Related papers (2023-03-27T17:59:34Z)
Geographic Adaptation of Pretrained Language Models [29.81557992080902]
We introduce geoadaptation, an intermediate training step that couples language modeling with geolocation prediction in a multi-task learning setup. We show that the effectiveness of geoadaptation stems from its ability to geographically retrofit the representation space of the pretrained language models.
arXiv Detail & Related papers (2022-03-16T11:55:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.