Crosslinguistic word order variation reflects evolutionary pressures of
dependency and information locality
- URL: http://arxiv.org/abs/2206.04239v1
- Date: Thu, 9 Jun 2022 02:56:53 GMT
- Title: Crosslinguistic word order variation reflects evolutionary pressures of
dependency and information locality
- Authors: Michael Hahn, Yang Xu
- Abstract summary: About 40% of the world's languages have subject-verb-object order, and about 40% have subject-object-verb order.
We show that variation in word order reflects different ways of balancing competing pressures of dependency locality and information locality.
Our findings suggest that syntactic structure and usage across languages co-adapt to support efficient communication under limited cognitive resources.
- Score: 4.869029215261254
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Languages vary considerably in syntactic structure. About 40% of the world's
languages have subject-verb-object order, and about 40% have
subject-object-verb order. Extensive work has sought to explain this word order
variation across languages. However, the existing approaches are not able to
explain coherently the frequency distribution and evolution of word order in
individual languages. We propose that variation in word order reflects
different ways of balancing competing pressures of dependency locality and
information locality, whereby languages favor placing elements together when
they are syntactically related or contextually informative about each other.
Using data from 80 languages in 17 language families and phylogenetic modeling,
we demonstrate that languages evolve to balance these pressures, such that word
order change is accompanied by change in the frequency distribution of the
syntactic structures which speakers communicate to maintain overall efficiency.
Variability in word order thus reflects different ways in which languages
resolve these evolutionary pressures. We identify relevant characteristics that
result from this joint optimization, particularly the frequency with which
subjects and objects are expressed together for the same verb. Our findings
suggest that syntactic structure and usage across languages co-adapt to support
efficient communication under limited cognitive resources.
Related papers
- Assessing the Role of Lexical Semantics in Cross-lingual Transfer through Controlled Manipulations [15.194196775504613]
We analyze how differences between English and a target language influence the capacity to align the language with an English pretrained representation space.
We show that while properties such as the script or word order only have a limited impact on alignment quality, the degree of lexical matching between the two languages, which we define using a measure of translation entropy, greatly affects it.
arXiv Detail & Related papers (2024-08-14T14:59:20Z) - Linguistic Structure from a Bottleneck on Sequential Information Processing [5.850665541267672]
We show that natural-language-like systematicity arises in codes that are constrained by predictive information.
We show that human languages are structured to have low predictive information at the levels of phonology, morphology, syntax, and semantics.
arXiv Detail & Related papers (2024-05-20T15:25:18Z) - A Cross-Linguistic Pressure for Uniform Information Density in Word
Order [79.54362557462359]
We use computational models to test whether real orders lead to greater information uniformity than counterfactual orders.
Among SVO languages, real word orders consistently have greater uniformity than reverse word orders.
Only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders.
arXiv Detail & Related papers (2023-06-06T14:52:15Z) - Reliable Detection and Quantification of Selective Forces in Language
Change [3.55026004901472]
We apply a recently-introduced method to corpus data to quantify the strength of selection in specific instances of historical language change.
We show that this method is more reliable and interpretable than similar methods that have previously been applied.
arXiv Detail & Related papers (2023-05-25T10:20:15Z) - Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer? [50.48082721476612]
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability.
We investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages.
arXiv Detail & Related papers (2022-12-21T09:44:08Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Subdiffusive semantic evolution in Indo-European languages [0.0]
We find that semantic evolution is strongly subdiffusive across five major Indo-European languages.
We show that words follow trajectories in meaning space with an anomalous diffusion exponent.
We furthermore show that strong subdiffusion is a robust phenomenon under a wide variety of choices in data analysis and interpretation.
arXiv Detail & Related papers (2022-09-10T15:57:32Z) - When is BERT Multilingual? Isolating Crucial Ingredients for
Cross-lingual Transfer [15.578267998149743]
We show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order.
There is a strong correlation between transfer performance and word embedding alignment between languages.
Our results call for focus in multilingual models on explicitly improving word embedding alignment between languages.
arXiv Detail & Related papers (2021-10-27T21:25:39Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.