On the Scaling Laws of Geographical Representation in Language Models
- URL: http://arxiv.org/abs/2402.19406v2
- Date: Mon, 4 Mar 2024 11:35:02 GMT
- Title: On the Scaling Laws of Geographical Representation in Language Models
- Authors: Nathan Godey, \'Eric de la Clergerie, Beno\^it Sagot
- Abstract summary: We show that geographical knowledge is observable even for tiny models, and that it scales consistently as we increase the model size.
Notably, we observe that larger language models cannot mitigate the geographical bias that is inherent to the training data.
- Score: 0.11510009152620666
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Language models have long been shown to embed geographical information in
their hidden representations. This line of work has recently been revisited by
extending this result to Large Language Models (LLMs). In this paper, we
propose to fill the gap between well-established and recent literature by
observing how geographical knowledge evolves when scaling language models. We
show that geographical knowledge is observable even for tiny models, and that
it scales consistently as we increase the model size. Notably, we observe that
larger language models cannot mitigate the geographical bias that is inherent
to the training data.
Related papers
- Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations [2.825324306665133]
This study focuses on biases related to geographical knowledge.
We explore the connection between geography and language models by highlighting their tendency to misrepresent spatial information.
arXiv Detail & Related papers (2024-04-26T13:22:28Z) - Pixel Aligned Language Models [94.32841818609914]
We develop a vision-language model that can take locations as either inputs or outputs.
When taking locations as inputs, the model performs location-conditioned captioning, which generates captions for the indicated object or region.
Our model is pre-trained on the Localized Narrative dataset, which contains pixel-word-aligned captioning from human attention.
arXiv Detail & Related papers (2023-12-14T18:57:58Z) - Geographical Erasure in Language Generation [13.219867587151986]
We study and operationalise a form of geographical erasure, wherein language models underpredict certain countries.
We discover that erasure strongly correlates with low frequencies of country mentions in the training corpus.
We mitigate erasure by finetuning using a custom objective.
arXiv Detail & Related papers (2023-10-23T10:26:14Z) - Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities.
After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z) - Geographic and Geopolitical Biases of Language Models [43.62238334380897]
We propose an approach to study the geographic bias (and knowledge) present in pretrained language models (PLMs)
Our findings suggest PLMs' representations map surprisingly well to the physical world in terms of country-to-country associations.
Last, we explain how large PLMs despite exhibiting notions of geographical proximity, over-amplify geopoliticalitism at inference time.
arXiv Detail & Related papers (2022-12-20T16:32:54Z) - Measuring Geographic Performance Disparities of Offensive Language
Classifiers [12.545108947857802]
We ask two questions: Does language, dialect, and topical content vary across geographical regions?'' and If there are differences across the regions, do they impact model performance?''
We find that current models do not generalize across locations. Likewise, we show that while offensive language models produce false positives on African American English, model performance is not correlated with each city's minority population proportions.
arXiv Detail & Related papers (2022-09-15T15:08:18Z) - Do Language Models Know the Way to Rome? [4.344337854565144]
We exploit the fact that in geography, ground truths are available beyond local relations.
We find that language models generally encode limited geographic information, but with larger models performing the best.
arXiv Detail & Related papers (2021-09-16T13:28:16Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.