Word length predicts word order: "Min-max"-ing drives language evolution
- URL: http://arxiv.org/abs/2505.13913v1
- Date: Tue, 20 May 2025 04:25:55 GMT
- Title: Word length predicts word order: "Min-max"-ing drives language evolution
- Authors: Hiram Ring,
- Abstract summary: This paper proposes a universal underlying mechanism for word order change based on a large tagged parallel dataset of over 1,500 languages.<n>Findings suggest an integrated "Min-Max" theory of language evolution driven by competing pressures of processing and information structure.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current theories of language propose an innate (Baker 2001; Chomsky 1981) or a functional (Greenberg 1963; Dryer 2007; Hawkins 2014) origin for the surface structures (i.e. word order) that we observe in languages of the world, while evolutionary modeling (Dunn et al. 2011) suggests that descent is the primary factor influencing such patterns. Although there are hypotheses for word order change from both innate and usage-based perspectives for specific languages and families, there are key disagreements between the two major proposals for mechanisms that drive the evolution of language more broadly (Wasow 2002; Levy 2008). This paper proposes a universal underlying mechanism for word order change based on a large tagged parallel dataset of over 1,500 languages representing 133 language families and 111 isolates. Results indicate that word class length is significantly correlated with word order crosslinguistically, but not in a straightforward manner, partially supporting opposing theories of processing, while at the same time predicting historical word order change in two different phylogenetic lines and explaining more variance than descent or language area in regression models. Such findings suggest an integrated "Min-Max" theory of language evolution driven by competing pressures of processing and information structure, aligning with recent efficiency-oriented (Levshina 2023) and information-theoretic proposals (Zaslavsky 2020; Tucker et al. 2025).
Related papers
- Robustness of the Random Language Model [0.0]
The model suggests a simple picture of first language learning as a type of annealing in the vast space of potential languages.
It implies a single continuous transition to grammatical syntax, at which the symmetry among potential words and categories is spontaneously broken.
Results are discussed in light of theory of first-language acquisition in linguistics, and recent successes in machine learning.
arXiv Detail & Related papers (2023-09-26T13:14:35Z) - Testing the Predictions of Surprisal Theory in 11 Languages [77.45204595614]
We investigate the relationship between surprisal and reading times in eleven different languages.<n>By focusing on a more diverse set of languages, we argue that these results offer the most robust link to-date between information theory and incremental language processing across languages.
arXiv Detail & Related papers (2023-07-07T15:37:50Z) - Crosslinguistic word order variation reflects evolutionary pressures of
dependency and information locality [4.869029215261254]
About 40% of the world's languages have subject-verb-object order, and about 40% have subject-object-verb order.
We show that variation in word order reflects different ways of balancing competing pressures of dependency locality and information locality.
Our findings suggest that syntactic structure and usage across languages co-adapt to support efficient communication under limited cognitive resources.
arXiv Detail & Related papers (2022-06-09T02:56:53Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Mechanism of Evolution Shared by Gene and Language [8.882751635947027]
We propose a general mechanism for evolution to explain the diversity of gene and language.
We find that the classical correspondence, "domain plays the role of word in gene language", is not rigorous.
We devise a new evolution unit, syllgram, to include the characteristics of spoken and written language.
arXiv Detail & Related papers (2020-12-28T15:46:19Z) - NEMO: Frequentist Inference Approach to Constrained Linguistic Typology
Feature Prediction in SIGTYP 2020 Shared Task [83.43738174234053]
We employ frequentist inference to represent correlations between typological features and use this representation to train simple multi-class estimators that predict individual features.
Our best configuration achieved the micro-averaged accuracy score of 0.66 on 149 test languages.
arXiv Detail & Related papers (2020-10-12T19:25:43Z) - The optimality of syntactic dependency distances [0.802904964931021]
We recast the problem of the optimality of the word order of a sentence as an optimization problem on a spatial network.
We introduce a new score to quantify the cognitive pressure to reduce the distance between linked words in a sentence.
The analysis of sentences from 93 languages reveals that half of languages are optimized to a 70% or more.
arXiv Detail & Related papers (2020-07-30T09:40:41Z) - Constructing a Family Tree of Ten Indo-European Languages with
Delexicalized Cross-linguistic Transfer Patterns [57.86480614673034]
We formalize the delexicalized transfer as interpretable tree-to-string and tree-to-tree patterns.
This allows us to quantitatively probe cross-linguistic transfer and extend inquiries of Second Language Acquisition.
arXiv Detail & Related papers (2020-07-17T15:56:54Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z) - Geospatial distributions reflect rates of evolution of features of language [0.0]
We propose a model-based approach to the problem through the analysis of language change as a process combining vertical descent, spatial interactions, and mutations in both dimensions.<n>A notion of linguistic temperature emerges naturally from this analysis as a dimensionless measure of the propensity of a linguistic feature to undergo change.<n>We demonstrate how temperatures of linguistic features can be inferred from their present-day geospatial distributions, without recourse to information about their phylogenies.
arXiv Detail & Related papers (2018-01-29T17:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.