Revisiting the Uniform Information Density Hypothesis
- URL: http://arxiv.org/abs/2109.11635v1
- Date: Thu, 23 Sep 2021 20:41:47 GMT
- Title: Revisiting the Uniform Information Density Hypothesis
- Authors: Clara Meister, Tiago Pimentel, Patrick Haller, Lena J\"ager, Ryan
Cotterell, Roger Levy
- Abstract summary: We investigate the uniform information density (UID) hypothesis using reading time and acceptability data.
For acceptability judgments, we find clearer evidence that non-uniformity in information density is predictive of lower acceptability.
- Score: 44.277066511088634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The uniform information density (UID) hypothesis posits a preference among
language users for utterances structured such that information is distributed
uniformly across a signal. While its implications on language production have
been well explored, the hypothesis potentially makes predictions about language
comprehension and linguistic acceptability as well. Further, it is unclear how
uniformity in a linguistic signal -- or lack thereof -- should be measured, and
over which linguistic unit, e.g., the sentence or language level, this
uniformity should hold. Here we investigate these facets of the UID hypothesis
using reading time and acceptability data. While our reading time results are
generally consistent with previous work, they are also consistent with a weakly
super-linear effect of surprisal, which would be compatible with UID's
predictions. For acceptability judgments, we find clearer evidence that
non-uniformity in information density is predictive of lower acceptability. We
then explore multiple operationalizations of UID, motivated by different
interpretations of the original hypothesis, and analyze the scope over which
the pressure towards uniformity is exerted. The explanatory power of a subset
of the proposed operationalizations suggests that the strongest trend may be a
regression towards a mean surprisal across the language, rather than the
phrase, sentence, or document -- a finding that supports a typical
interpretation of UID, namely that it is the byproduct of language users
maximizing the use of a (hypothetical) communication channel.
Related papers
- Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse [54.08750245737734]
We propose that speakers modulate information rate based on location within a hierarchically-structured model of discourse.
We find that hierarchical predictors are significant predictors of a discourse's information contour and that deeply nested hierarchical predictors are more predictive than shallow ones.
arXiv Detail & Related papers (2024-10-21T14:42:37Z) - That's Optional: A Contemporary Exploration of "that" Omission in English Subordinate Clauses [2.1781981800541805]
The Uniform Information Density hypothesis posits that speakers optimize the communicative properties of their utterances by avoiding spikes in information.
This paper investigates the impact of UID principles on syntactic reduction, specifically focusing on the optional omission of the connector "that" in English subordinate clauses.
arXiv Detail & Related papers (2024-05-31T14:23:30Z) - Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space.
However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts.
We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z) - A Cross-Linguistic Pressure for Uniform Information Density in Word
Order [79.54362557462359]
We use computational models to test whether real orders lead to greater information uniformity than counterfactual orders.
Among SVO languages, real word orders consistently have greater uniformity than reverse word orders.
Only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders.
arXiv Detail & Related papers (2023-06-06T14:52:15Z) - Revisiting Entropy Rate Constancy in Text [43.928576088761844]
The uniform information density hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse.
We re-evaluate the claims of Genzel & Charniak (2002) with neural language models, failing to find clear evidence in support of entropy rate constancy.
arXiv Detail & Related papers (2023-05-20T03:48:31Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - A Cognitive Regularizer for Language Modeling [36.256053903862956]
We augment the canonical MLE objective for training language models by encoding UID as regularization.
We find that using UID regularization consistently improves perplexity in language models.
We also find that UID-regularized language models are higher-entropy and produce text that is longer and more lexically diverse.
arXiv Detail & Related papers (2021-05-15T05:37:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.