Geospatial distributions reflect rates of evolution of features of language
- URL: http://arxiv.org/abs/1801.09637v2
- Date: Tue, 28 Jan 2025 12:54:00 GMT
- Title: Geospatial distributions reflect rates of evolution of features of language
- Authors: Henri Kauhanen, Deepthi Gopal, Tobias Galla, Ricardo Bermúdez-Otero,
- Abstract summary: We propose a model-based approach to the problem through the analysis of language change as a process combining vertical descent, spatial interactions, and mutations in both dimensions.
A notion of linguistic temperature emerges naturally from this analysis as a dimensionless measure of the propensity of a linguistic feature to undergo change.
We demonstrate how temperatures of linguistic features can be inferred from their present-day geospatial distributions, without recourse to information about their phylogenies.
- Score: 0.0
- License:
- Abstract: Quantifying the speed of linguistic change is challenging due to the fact that the historical evolution of languages is sparsely documented. Consequently, traditional methods rely on phylogenetic reconstruction. In this paper, we propose a model-based approach to the problem through the analysis of language change as a stochastic process combining vertical descent, spatial interactions, and mutations in both dimensions. A notion of linguistic temperature emerges naturally from this analysis as a dimensionless measure of the propensity of a linguistic feature to undergo change. We demonstrate how temperatures of linguistic features can be inferred from their present-day geospatial distributions, without recourse to information about their phylogenies. Thus the evolutionary dynamics of language, operating across thousands of years, leaves a measurable geospatial signature. This signature licenses inferences about the historical evolution of languages even in the absence of longitudinal data.
Related papers
- Patterns of Persistence and Diffusibility across the World's Languages [3.7055269158186874]
Colexification is a type of similarity where a single lexical form is used to convey multiple meanings.
We shed light on the linguistic causes of cross-lingual similarity in colexification and phonology.
We construct large-scale graphs incorporating semantic, genealogical, phonological and geographical data for 1,966 languages.
arXiv Detail & Related papers (2024-01-03T12:05:38Z) - Reliable Detection and Quantification of Selective Forces in Language
Change [3.55026004901472]
We apply a recently-introduced method to corpus data to quantify the strength of selection in specific instances of historical language change.
We show that this method is more reliable and interpretable than similar methods that have previously been applied.
arXiv Detail & Related papers (2023-05-25T10:20:15Z) - Evolution of grammatical forms: some quantitative approaches [0.0]
Grammatical forms are said to evolve via two main mechanisms.
These are the descent' mechanism and the contact' mechanism.
We use ideas and concepts from statistical physics to formulate a series of static and dynamical models.
arXiv Detail & Related papers (2023-02-06T09:50:48Z) - Subdiffusive semantic evolution in Indo-European languages [0.0]
We find that semantic evolution is strongly subdiffusive across five major Indo-European languages.
We show that words follow trajectories in meaning space with an anomalous diffusion exponent.
We furthermore show that strong subdiffusion is a robust phenomenon under a wide variety of choices in data analysis and interpretation.
arXiv Detail & Related papers (2022-09-10T15:57:32Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Creolizing the Web [2.393911349115195]
We present a method for detecting evolutionary patterns in a sociological model of language evolution.
We develop a minimalistic model that provides a rigorous base for any generalized evolutionary model for language based on communication between individuals.
We present empirical results and their interpretations on a real world dataset from rdt to identify communities and echo chambers for opinions.
arXiv Detail & Related papers (2021-02-24T16:08:45Z) - Lexical semantic change for Ancient Greek and Latin [61.69697586178796]
Associating a word's correct meaning in its historical context is a central challenge in diachronic research.
We build on a recent computational approach to semantic change based on a dynamic Bayesian mixture model.
We provide a systematic comparison of dynamic Bayesian mixture models for semantic change with state-of-the-art embedding-based models.
arXiv Detail & Related papers (2021-01-22T12:04:08Z) - Constructing a Family Tree of Ten Indo-European Languages with
Delexicalized Cross-linguistic Transfer Patterns [57.86480614673034]
We formalize the delexicalized transfer as interpretable tree-to-string and tree-to-tree patterns.
This allows us to quantitatively probe cross-linguistic transfer and extend inquiries of Second Language Acquisition.
arXiv Detail & Related papers (2020-07-17T15:56:54Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.