Geographic and Geopolitical Biases of Language Models
- URL: http://arxiv.org/abs/2212.10408v1
- Date: Tue, 20 Dec 2022 16:32:54 GMT
- Title: Geographic and Geopolitical Biases of Language Models
- Authors: Fahim Faisal, Antonios Anastasopoulos
- Abstract summary: We propose an approach to study the geographic bias (and knowledge) present in pretrained language models (PLMs)
Our findings suggest PLMs' representations map surprisingly well to the physical world in terms of country-to-country associations.
Last, we explain how large PLMs despite exhibiting notions of geographical proximity, over-amplify geopoliticalitism at inference time.
- Score: 43.62238334380897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained language models (PLMs) often fail to fairly represent target users
from certain world regions because of the under-representation of those regions
in training datasets. With recent PLMs trained on enormous data sources,
quantifying their potential biases is difficult, due to their black-box nature
and the sheer scale of the data sources. In this work, we devise an approach to
study the geographic bias (and knowledge) present in PLMs, proposing a
Geographic-Representation Probing Framework adopting a self-conditioning method
coupled with entity-country mappings. Our findings suggest PLMs'
representations map surprisingly well to the physical world in terms of
country-to-country associations, but this knowledge is unequally shared across
languages. Last, we explain how large PLMs despite exhibiting notions of
geographical proximity, over-amplify geopolitical favouritism at inference
time.
Related papers
- Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input [2.516307239032451]
We present a method which represents real-world locations as averaged embeddings from labeled user-input location names.
We show that our approach improves geo-entity linking on a global and multilingual social media dataset.
arXiv Detail & Related papers (2024-04-29T15:18:33Z) - Large Language Models are Geographically Biased [47.88767211956144]
We study what Large Language Models (LLMs) know about the world we live in through the lens of geography.
We show various problematic geographic biases, which we define as systemic errors in geospatial predictions.
arXiv Detail & Related papers (2024-02-05T02:32:09Z) - Geographical Erasure in Language Generation [13.219867587151986]
We study and operationalise a form of geographical erasure, wherein language models underpredict certain countries.
We discover that erasure strongly correlates with low frequencies of country mentions in the training corpus.
We mitigate erasure by finetuning using a custom objective.
arXiv Detail & Related papers (2023-10-23T10:26:14Z) - GeoLM: Empowering Language Models for Geospatially Grounded Language
Understanding [45.36562604939258]
This paper introduces GeoLM, a language model that enhances the understanding of geo-entities in natural language.
We demonstrate that GeoLM exhibits promising capabilities in supporting toponym recognition, toponym linking, relation extraction, and geo-entity typing.
arXiv Detail & Related papers (2023-10-23T01:20:01Z) - GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models.
We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods.
Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark [56.08664336835741]
We propose a GeoGraphic Language Understanding Evaluation benchmark, named GeoGLUE.
We collect data from open-released geographic resources and introduce six natural language understanding tasks.
We pro vide evaluation experiments and analysis of general baselines, indicating the effectiveness and significance of the GeoGLUE benchmark.
arXiv Detail & Related papers (2023-05-11T03:21:56Z) - GeoNet: Benchmarking Unsupervised Adaptation across Geographies [71.23141626803287]
We study the problem of geographic robustness and make three main contributions.
First, we introduce a large-scale dataset GeoNet for geographic adaptation.
Second, we hypothesize that the major source of domain shifts arise from significant variations in scene context.
Third, we conduct an extensive evaluation of several state-of-the-art unsupervised domain adaptation algorithms and architectures.
arXiv Detail & Related papers (2023-03-27T17:59:34Z) - Geographic Adaptation of Pretrained Language Models [29.81557992080902]
We introduce geoadaptation, an intermediate training step that couples language modeling with geolocation prediction in a multi-task learning setup.
We show that the effectiveness of geoadaptation stems from its ability to geographically retrofit the representation space of the pretrained language models.
arXiv Detail & Related papers (2022-03-16T11:55:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.