Comparative Computational Analysis of Global Structure in Canonical,
Non-Canonical and Non-Literary Texts
- URL: http://arxiv.org/abs/2008.10906v1
- Date: Tue, 25 Aug 2020 09:37:06 GMT
- Title: Comparative Computational Analysis of Global Structure in Canonical,
Non-Canonical and Non-Literary Texts
- Authors: Mahdi Mohseni, Volker Gast, Christoph Redies
- Abstract summary: Three text types (non-literary, literary/canonical and literary/non-canonical) exhibit systematic differences with respect to structural design features as correlates of aesthetic responses in readers.
Two aspects of global structure are investigated, variability and self-similar (fractal) patterns, which reflect long-range correlations along texts.
Our results show that low-level properties of texts are better discriminators than high-level properties, for the three text types under analysis.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study investigates global properties of literary and non-literary texts.
Within the literary texts, a distinction is made between canonical and
non-canonical works. The central hypothesis of the study is that the three text
types (non-literary, literary/canonical and literary/non-canonical) exhibit
systematic differences with respect to structural design features as correlates
of aesthetic responses in readers. To investigate these differences, we
compiled a corpus containing texts of the three categories of interest, the
Jena Textual Aesthetics Corpus. Two aspects of global structure are
investigated, variability and self-similar (fractal) patterns, which reflect
long-range correlations along texts. We use four types of basic observations,
(i) the frequency of POS-tags per sentence, (ii) sentence length, (iii) lexical
diversity in chunks of text, and (iv) the distribution of topic probabilities
in chunks of texts. These basic observations are grouped into two more general
categories, (a) the low-level properties (i) and (ii), which are observed at
the level of the sentence (reflecting linguistic decoding), and (b) the
high-level properties (iii) and (iv), which are observed at the textual level
(reflecting comprehension). The basic observations are transformed into time
series, and these time series are subject to multifractal detrended fluctuation
analysis (MFDFA). Our results show that low-level properties of texts are
better discriminators than high-level properties, for the three text types
under analysis. Canonical literary texts differ from non-canonical ones
primarily in terms of variability. Fractality seems to be a universal feature
of text, more pronounced in non-literary than in literary texts. Beyond the
specific results of the study, we intend to open up new perspectives on the
experimental study of textual aesthetics.
Related papers
- Estimating the Influence of Sequentially Correlated Literary Properties in Textual Classification: A Data-Centric Hypothesis-Testing Approach [4.161155428666988]
Stylometry aims to distinguish authors by analyzing literary traits assumed to reflect semi-conscious choices distinct from elements like genre or theme.
While some literary properties, such as thematic content, are likely to manifest as correlations between adjacent text units, others, like authorial style, may be independent thereof.
We introduce a hypothesis-testing approach to evaluate the influence of sequentially correlated literary properties on text classification.
arXiv Detail & Related papers (2024-11-07T18:28:40Z) - Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs [19.073560504913356]
The line between human-crafted and machine-generated texts has become increasingly blurred.
This paper delves into the inquiry of identifying discernible and unique linguistic properties in texts that were written by humans.
arXiv Detail & Related papers (2024-02-16T11:20:30Z) - Complex systems approach to natural language [0.0]
Review summarizes the main methodological concepts used in studying natural language from the perspective of complexity science.
Three main complexity-related research trends in quantitative linguistics are covered.
arXiv Detail & Related papers (2024-01-05T12:01:26Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - How Do In-Context Examples Affect Compositional Generalization? [86.57079616209474]
In this paper, we present CoFe, a test suite to investigate in-context compositional generalization.
We find that the compositional generalization performance can be easily affected by the selection of in-context examples.
Our systematic experiments indicate that in-context examples should be structurally similar to the test case, diverse from each other, and individually simple.
arXiv Detail & Related papers (2023-05-08T16:32:18Z) - A Statistical Exploration of Text Partition Into Constituents: The Case
of the Priestly Source in the Books of Genesis and Exodus [1.8780017602640042]
We present a pipeline for a statistical textual exploration, offering a stylometry-based explanation and statistical validation of a hypothesized partition of a text.
We apply our pipeline to the first two books in the Bible, where one stylistic component stands out in the eyes of biblical scholars, namely, the Priestly component.
arXiv Detail & Related papers (2023-05-03T15:07:42Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - An Informational Space Based Semantic Analysis for Scientific Texts [62.997667081978825]
This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts.
The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties.
The research in this paper conducts the base for the geometric representation of the meaning of texts.
arXiv Detail & Related papers (2022-05-31T11:19:32Z) - How much do language models copy from their training data? Evaluating
linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text.
Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions?
We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z) - Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship
Attribution [74.27826764855911]
We employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts.
Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
arXiv Detail & Related papers (2021-10-27T06:25:31Z) - Quasi Error-free Text Classification and Authorship Recognition in a
large Corpus of English Literature based on a Novel Feature Set [0.0]
We show that in the entire GLEC quasi error-free text classification and authorship recognition is possible with a method using the same set of five style and five content features.
Our data pave the way for many future computational and empirical studies of literature or experiments in reading psychology.
arXiv Detail & Related papers (2020-10-21T07:39:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.