Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse
- URL: http://arxiv.org/abs/2410.16062v1
- Date: Mon, 21 Oct 2024 14:42:37 GMT
- Title: Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse
- Authors: Eleftheria Tsipidi, Franz Nowak, Ryan Cotterell, Ethan Wilcox, Mario Giulianelli, Alex Warstadt,
- Abstract summary: We propose that speakers modulate information rate based on location within a hierarchically-structured model of discourse.
We find that hierarchical predictors are significant predictors of a discourse's information contour and that deeply nested hierarchical predictors are more predictive than shallow ones.
- Score: 54.08750245737734
- License:
- Abstract: The Uniform Information Density (UID) hypothesis posits that speakers tend to distribute information evenly across linguistic units to achieve efficient communication. Of course, information rate in texts and discourses is not perfectly uniform. While these fluctuations can be viewed as theoretically uninteresting noise on top of a uniform target, another explanation is that UID is not the only functional pressure regulating information content in a language. Speakers may also seek to maintain interest, adhere to writing conventions, and build compelling arguments. In this paper, we propose one such functional pressure; namely that speakers modulate information rate based on location within a hierarchically-structured model of discourse. We term this the Structured Context Hypothesis and test it by predicting the surprisal contours of naturally occurring discourses extracted from large language models using predictors derived from discourse structure. We find that hierarchical predictors are significant predictors of a discourse's information contour and that deeply nested hierarchical predictors are more predictive than shallow ones. This work takes an initial step beyond UID to propose testable hypotheses for why the information rate fluctuates in predictable ways
Related papers
- Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling [16.73336092521471]
This paper aims to remove speaker information by exploiting the structured nature of speech.
A neural network predicts these boundaries, enabling variable-length pooling for event-based representation extraction.
To confirm that the learned representation includes contents information but is independent of speaker information, the model was evaluated with libri-light's phonetic ABX task and SUPERB's speaker identification task.
arXiv Detail & Related papers (2024-04-01T01:49:09Z) - Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space.
However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts.
We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z) - A Cross-Linguistic Pressure for Uniform Information Density in Word
Order [79.54362557462359]
We use computational models to test whether real orders lead to greater information uniformity than counterfactual orders.
Among SVO languages, real word orders consistently have greater uniformity than reverse word orders.
Only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders.
arXiv Detail & Related papers (2023-06-06T14:52:15Z) - CUE: An Uncertainty Interpretation Framework for Text Classifiers Built
on Pre-Trained Language Models [28.750894873827068]
We propose a novel framework, called CUE, which aims to interpret uncertainties inherent in the predictions of PLM-based models.
By comparing the difference in predictive uncertainty between the perturbed and the original text representations, we are able to identify the latent dimensions responsible for uncertainty.
arXiv Detail & Related papers (2023-06-06T11:37:46Z) - How to Plant Trees in Language Models: Data and Architectural Effects on
the Emergence of Syntactic Inductive Biases [28.58785395946639]
We show that pre-training can teach language models to rely on hierarchical syntactic features when performing tasks after fine-tuning.
We focus on architectural features (depth, width, and number of parameters), as well as the genre and size of the pre-training corpus.
arXiv Detail & Related papers (2023-05-31T14:38:14Z) - Revisiting Entropy Rate Constancy in Text [43.928576088761844]
The uniform information density hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse.
We re-evaluate the claims of Genzel & Charniak (2002) with neural language models, failing to find clear evidence in support of entropy rate constancy.
arXiv Detail & Related papers (2023-05-20T03:48:31Z) - Automatic Dialect Density Estimation for African American English [74.44807604000967]
We explore automatic prediction of dialect density of the African American English (AAE) dialect.
dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect.
We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database.
arXiv Detail & Related papers (2022-04-03T01:34:48Z) - Revisiting the Uniform Information Density Hypothesis [44.277066511088634]
We investigate the uniform information density (UID) hypothesis using reading time and acceptability data.
For acceptability judgments, we find clearer evidence that non-uniformity in information density is predictive of lower acceptability.
arXiv Detail & Related papers (2021-09-23T20:41:47Z) - Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis [68.76620947298595]
Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text.
We propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody.
arXiv Detail & Related papers (2021-06-15T18:03:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.