A decomposition of book structure through ousiometric fluctuations in
cumulative word-time
- URL: http://arxiv.org/abs/2208.09496v4
- Date: Fri, 12 May 2023 00:54:52 GMT
- Title: A decomposition of book structure through ousiometric fluctuations in
cumulative word-time
- Authors: Mikaela Irene Fudolig, Thayer Alshaabi, Kathryn Cramer, Christopher M.
Danforth, Peter Sheridan Dodds
- Abstract summary: We look at how words change over the course of a book as a function of the number of words, rather than the fraction of the book.
We find that shorter books exhibit only a general trend, while longer books have fluctuations in addition to the general trend.
Our findings suggest that, in the ousiometric sense, longer books are not expanded versions of shorter books, but are more similar in structure to a concatenation of shorter texts.
- Score: 1.181206257787103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While quantitative methods have been used to examine changes in word usage in
books, studies have focused on overall trends, such as the shapes of
narratives, which are independent of book length. We instead look at how words
change over the course of a book as a function of the number of words, rather
than the fraction of the book, completed at any given point; we define this
measure as "cumulative word-time". Using ousiometrics, a reinterpretation of
the valence-arousal-dominance framework of meaning obtained from semantic
differentials, we convert text into time series of power and danger scores in
cumulative word-time. Each time series is then decomposed using empirical mode
decomposition into a sum of constituent oscillatory modes and a non-oscillatory
trend. By comparing the decomposition of the original power and danger time
series with those derived from shuffled text, we find that shorter books
exhibit only a general trend, while longer books have fluctuations in addition
to the general trend. These fluctuations typically have a period of a few
thousand words regardless of the book length or library classification code,
but vary depending on the content and structure of the book. Our findings
suggest that, in the ousiometric sense, longer books are not expanded versions
of shorter books, but are more similar in structure to a concatenation of
shorter texts. Further, they are consistent with editorial practices that
require longer texts to be broken down into sections, such as chapters. Our
method also provides a data-driven denoising approach that works for texts of
various lengths, in contrast to the more traditional approach of using large
window sizes that may inadvertently smooth out relevant information, especially
for shorter texts. These results open up avenues for future work in
computational literary analysis, particularly the measurement of a basic unit
of narrative.
Related papers
- Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - Textual Stylistic Variation: Choices, Genres and Individuals [0.8057441774248633]
This chapter argues for more informed target metrics for the statistical processing of stylistic variation in text collections.
This chapter discusses variation given by genre, and contrasts it to variation occasioned by individual choice.
arXiv Detail & Related papers (2022-05-01T16:39:49Z) - Compositional Temporal Grounding with Structured Variational Cross-Graph
Correspondence Learning [92.07643510310766]
Temporal grounding in videos aims to localize one target video segment that semantically corresponds to a given query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We empirically find that they fail to generalize to queries with novel combinations of seen words.
We propose a variational cross-graph reasoning framework that explicitly decomposes video and language into multiple structured hierarchies.
arXiv Detail & Related papers (2022-03-24T12:55:23Z) - Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship
Attribution [74.27826764855911]
We employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts.
Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
arXiv Detail & Related papers (2021-10-27T06:25:31Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Topical Change Detection in Documents via Embeddings of Long Sequences [4.13878392637062]
We formulate the task of text segmentation as an independent supervised prediction task.
By fine-tuning on paragraphs of similar sections, we are able to show that learned features encode topic information.
Unlike previous approaches, which mostly operate on sentence-level, we consistently use a broader context.
arXiv Detail & Related papers (2020-12-07T12:09:37Z) - What time is it? Temporal Analysis of Novels [10.481474734742486]
We construct a data set of hourly time phrases from 52,183 fictional books.
We then construct a time-of-day classification model that achieves an average error of 2.27 hours.
We show that by analyzing a book in whole using dynamic programming of breakpoints, we can roughly partition a book into segments that each correspond to a particular time-of-day.
arXiv Detail & Related papers (2020-11-09T01:11:55Z) - Paragraph-level Commonsense Transformers with Recurrent Memory [77.4133779538797]
We train a discourse-aware model that incorporates paragraph-level information to generate coherent commonsense inferences from narratives.
Our results show that PARA-COMET outperforms the sentence-level baselines, particularly in generating inferences that are both coherent and novel.
arXiv Detail & Related papers (2020-10-04T05:24:12Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Generalized Word Shift Graphs: A Method for Visualizing and Explaining
Pairwise Comparisons Between Texts [0.15833270109954134]
A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content.
We introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts.
We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback-Leibler and Jensen-Shannon divergences.
arXiv Detail & Related papers (2020-08-05T17:27:11Z) - Heaps' law and Heaps functions in tagged texts: Evidences of their
linguistic relevance [0.0]
We study the relationship between vocabulary size and text length in a corpus of $75$ literary works in English.
We analyze the progressive appearance of new words of each tag along each individual text.
arXiv Detail & Related papers (2020-01-07T17:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.