Related papers: That's Optional: A Contemporary Exploration of "that" Omission in English Subordinate Clauses

That's Optional: A Contemporary Exploration of "that" Omission in English Subordinate Clauses

URL: http://arxiv.org/abs/2405.20833v1
Date: Fri, 31 May 2024 14:23:30 GMT
Title: That's Optional: A Contemporary Exploration of "that" Omission in English Subordinate Clauses
Authors: Ella Rabinovich,
Abstract summary: The Uniform Information Density hypothesis posits that speakers optimize the communicative properties of their utterances by avoiding spikes in information. This paper investigates the impact of UID principles on syntactic reduction, specifically focusing on the optional omission of the connector "that" in English subordinate clauses.
Score: 2.1781981800541805
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The Uniform Information Density (UID) hypothesis posits that speakers optimize the communicative properties of their utterances by avoiding spikes in information, thereby maintaining a relatively uniform information profile over time. This paper investigates the impact of UID principles on syntactic reduction, specifically focusing on the optional omission of the connector "that" in English subordinate clauses. Building upon previous research, we extend our investigation to a larger corpus of written English, utilize contemporary large language models (LLMs) and extend the information-uniformity principles by the notion of entropy, to estimate the UID manifestations in the usecase of syntactic reduction choices.

Related papers

"Lost-in-the-Later": Framework for Quantifying Contextual Grounding in Large Language Models [4.712325494028972]
We introduce CoPE, a novel evaluation framework that measures contextual knowledge across models and languages.<n>We analyze how large language models integrate context, prioritize information, and incorporate PK in open-ended question answering.<n>We find that reasoning models, as well as non-reasoning models prompted with chain-of-thought (CoT), use context even less than non-reasoning models without CoT and fail to mitigate the lost-in-the-later effect.
arXiv Detail & Related papers (2025-07-07T19:13:20Z)
LID Models are Actually Accent Classifiers: Implications and Solutions for LID on Accented Speech [4.654709537754806]
Prior research indicates that LID model performance significantly declines on accented speech.<n>We identify a common failure mode on accented speech whereby LID systems often misclassify L2 accented speech as the speaker's native language or a related language.<n>We present an approach that integrates sequence-level information into our model without relying on monolingual ASR systems.
arXiv Detail & Related papers (2025-05-31T16:35:40Z)
Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that the pointwise mutual information between a context and a question is an effective gauge for language model performance. We propose two methods that use the pointwise mutual information between a document and a question as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse [54.08750245737734]
We propose that speakers modulate information rate based on location within a hierarchically-structured model of discourse. We find that hierarchical predictors are significant predictors of a discourse's information contour and that deeply nested hierarchical predictors are more predictive than shallow ones.
arXiv Detail & Related papers (2024-10-21T14:42:37Z)
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts. We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z)
RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE) It encodes the text corpus into a latent space, capturing current and future information from both source and target text. Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z)
Topic-DPR: Topic-based Prompts for Dense Passage Retrieval [6.265789210037749]
We present Topic-DPR, a dense passage retrieval model that uses topic-based prompts. We introduce a novel positive and negative sampling strategy, leveraging semi-structured data to boost dense retrieval efficiency.
arXiv Detail & Related papers (2023-10-10T13:45:24Z)
Enhancing Argument Structure Extraction with Efficient Leverage of Contextual Information [79.06082391992545]
We propose an Efficient Context-aware model (ECASE) that fully exploits contextual information. We introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information. Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-08T08:47:10Z)
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information [68.89000132126536]
This work proposes to use inter-utterance linguistic information to improve the performance of prosodic structure prediction (PSP) Our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH)
arXiv Detail & Related papers (2023-08-31T09:19:15Z)
Revisiting Entropy Rate Constancy in Text [43.928576088761844]
The uniform information density hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse. We re-evaluate the claims of Genzel & Charniak (2002) with neural language models, failing to find clear evidence in support of entropy rate constancy.
arXiv Detail & Related papers (2023-05-20T03:48:31Z)
Revisiting the Uniform Information Density Hypothesis [44.277066511088634]
We investigate the uniform information density (UID) hypothesis using reading time and acceptability data. For acceptability judgments, we find clearer evidence that non-uniformity in information density is predictive of lower acceptability.
arXiv Detail & Related papers (2021-09-23T20:41:47Z)
A Cognitive Regularizer for Language Modeling [36.256053903862956]
We augment the canonical MLE objective for training language models by encoding UID as regularization. We find that using UID regularization consistently improves perplexity in language models. We also find that UID-regularized language models are higher-entropy and produce text that is longer and more lexically diverse.
arXiv Detail & Related papers (2021-05-15T05:37:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.