A Cross-Linguistic Pressure for Uniform Information Density in Word
Order
- URL: http://arxiv.org/abs/2306.03734v2
- Date: Sun, 9 Jul 2023 17:17:39 GMT
- Title: A Cross-Linguistic Pressure for Uniform Information Density in Word
Order
- Authors: Thomas Hikaru Clark, Clara Meister, Tiago Pimentel, Michael Hahn, Ryan
Cotterell, Richard Futrell and Roger Levy
- Abstract summary: We use computational models to test whether real orders lead to greater information uniformity than counterfactual orders.
Among SVO languages, real word orders consistently have greater uniformity than reverse word orders.
Only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders.
- Score: 79.54362557462359
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While natural languages differ widely in both canonical word order and word
order flexibility, their word orders still follow shared cross-linguistic
statistical patterns, often attributed to functional pressures. In the effort
to identify these pressures, prior work has compared real and counterfactual
word orders. Yet one functional pressure has been overlooked in such
investigations: the uniform information density (UID) hypothesis, which holds
that information should be spread evenly throughout an utterance. Here, we ask
whether a pressure for UID may have influenced word order patterns
cross-linguistically. To this end, we use computational models to test whether
real orders lead to greater information uniformity than counterfactual orders.
In our empirical study of 10 typologically diverse languages, we find that: (i)
among SVO languages, real word orders consistently have greater uniformity than
reverse word orders, and (ii) only linguistically implausible counterfactual
orders consistently exceed the uniformity of real orders. These findings are
compatible with a pressure for information uniformity in the development and
usage of natural languages.
Related papers
- Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse [54.08750245737734]
We propose that speakers modulate information rate based on location within a hierarchically-structured model of discourse.
We find that hierarchical predictors are significant predictors of a discourse's information contour and that deeply nested hierarchical predictors are more predictive than shallow ones.
arXiv Detail & Related papers (2024-10-21T14:42:37Z) - When does word order matter and when doesn't it? [31.092367724062644]
Language models (LMs) may appear insensitive to word order changes in natural language understanding tasks.
linguistic redundancy can explain this phenomenon, whereby word order and other linguistic cues provide overlapping and thus redundant information.
We quantify how informative word order is using mutual information (MI) between unscrambled and scrambled sentences.
arXiv Detail & Related papers (2024-02-29T04:11:10Z) - Crosslinguistic word order variation reflects evolutionary pressures of
dependency and information locality [4.869029215261254]
About 40% of the world's languages have subject-verb-object order, and about 40% have subject-object-verb order.
We show that variation in word order reflects different ways of balancing competing pressures of dependency locality and information locality.
Our findings suggest that syntactic structure and usage across languages co-adapt to support efficient communication under limited cognitive resources.
arXiv Detail & Related papers (2022-06-09T02:56:53Z) - Revisiting the Uniform Information Density Hypothesis [44.277066511088634]
We investigate the uniform information density (UID) hypothesis using reading time and acceptability data.
For acceptability judgments, we find clearer evidence that non-uniformity in information density is predictive of lower acceptability.
arXiv Detail & Related papers (2021-09-23T20:41:47Z) - Investigating Cross-Linguistic Adjective Ordering Tendencies with a
Latent-Variable Model [66.84264870118723]
We present the first purely corpus-driven model of multi-lingual adjective ordering in the form of a latent-variable model.
We provide strong converging evidence for the existence of universal, cross-linguistic, hierarchical adjective ordering tendencies.
arXiv Detail & Related papers (2020-10-09T18:27:55Z) - A Matter of Framing: The Impact of Linguistic Formalism on Probing
Results [69.36678873492373]
Deep pre-trained contextualized encoders like BERT (Delvin et al.) demonstrate remarkable performance on a range of downstream tasks.
Recent research in probing investigates the linguistic knowledge implicitly learned by these models during pre-training.
Can the choice of formalism affect probing results?
We find linguistically meaningful differences in the encoding of semantic role- and proto-role information by BERT depending on the formalism.
arXiv Detail & Related papers (2020-04-30T17:45:16Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.