A Statistical Exploration of Text Partition Into Constituents: The Case
of the Priestly Source in the Books of Genesis and Exodus
- URL: http://arxiv.org/abs/2305.02170v3
- Date: Sat, 10 Jun 2023 07:57:22 GMT
- Title: A Statistical Exploration of Text Partition Into Constituents: The Case
of the Priestly Source in the Books of Genesis and Exodus
- Authors: Gideon Yoffe and Axel B\"uhler and Nachum Dershowitz and Israel
Finkelstein and Eli Piasetzky and Thomas R\"omer and Barak Sober
- Abstract summary: We present a pipeline for a statistical textual exploration, offering a stylometry-based explanation and statistical validation of a hypothesized partition of a text.
We apply our pipeline to the first two books in the Bible, where one stylistic component stands out in the eyes of biblical scholars, namely, the Priestly component.
- Score: 1.8780017602640042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a pipeline for a statistical textual exploration, offering a
stylometry-based explanation and statistical validation of a hypothesized
partition of a text. Given a parameterization of the text, our pipeline: (1)
detects literary features yielding the optimal overlap between the hypothesized
and unsupervised partitions, (2) performs a hypothesis-testing analysis to
quantify the statistical significance of the optimal overlap, while conserving
implicit correlations between units of text that are more likely to be grouped,
and (3) extracts and quantifies the importance of features most responsible for
the classification, estimates their statistical stability and cluster-wise
abundance.
We apply our pipeline to the first two books in the Bible, where one
stylistic component stands out in the eyes of biblical scholars, namely, the
Priestly component. We identify and explore statistically significant stylistic
differences between the Priestly and non-Priestly components.
Related papers
- Estimating the Influence of Sequentially Correlated Literary Properties in Textual Classification: A Data-Centric Hypothesis-Testing Approach [4.161155428666988]
Stylometry aims to distinguish authors by analyzing literary traits assumed to reflect semi-conscious choices distinct from elements like genre or theme.
While some literary properties, such as thematic content, are likely to manifest as correlations between adjacent text units, others, like authorial style, may be independent thereof.
We introduce a hypothesis-testing approach to evaluate the influence of sequentially correlated literary properties on text classification.
arXiv Detail & Related papers (2024-11-07T18:28:40Z) - Critical biblical studies via word frequency analysis: unveiling text authorship [7.2762881851201255]
We aim to differentiate between three distinct authors across numerous chapters spanning the first nine books of the Bible.
Our analysis indicates that the first two authors (D and DtrH) are much more closely related compared to P, a fact that aligns with expert assessments.
arXiv Detail & Related papers (2024-10-24T22:08:38Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - Statistical Depth for Ranking and Characterizing Transformer-Based Text
Embeddings [1.321681963474017]
A statistical depth is a function for ranking k-dimensional objects by measuring centrality with respect to some observed k-dimensional distribution.
We adopt a statistical depth to measure distributions of transformer-based text embeddings, transformer-based text embedding (TTE) depth, and introduce the practical use of this depth for both modeling and distributional inference in NLP pipelines.
arXiv Detail & Related papers (2023-10-23T15:02:44Z) - PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and
Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually.
We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters.
Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Comprehensive Studies for Arbitrary-shape Scene Text Detection [78.50639779134944]
We propose a unified framework for the bottom-up based scene text detection methods.
Under the unified framework, we ensure the consistent settings for non-core modules.
With the comprehensive investigations and elaborate analyses, it reveals the advantages and disadvantages of previous models.
arXiv Detail & Related papers (2021-07-25T13:18:55Z) - Prosodic Representation Learning and Contextual Sampling for Neural
Text-to-Speech [16.45773135100367]
We introduce Kathaka, a model trained with a novel two-stage training process for neural speech synthesis.
We learn a prosodic distribution at the sentence level from mel-spectrograms available during training.
In Stage II, we propose a novel method to sample from this learnt prosodic distribution using the contextual information available in text.
arXiv Detail & Related papers (2020-11-04T12:20:21Z) - Pareto Probing: Trading Off Accuracy for Complexity [87.09294772742737]
We argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance.
Our experiments with dependency parsing reveal a wide gap in syntactic knowledge between contextual and non-contextual representations.
arXiv Detail & Related papers (2020-10-05T17:27:31Z) - Comparative Computational Analysis of Global Structure in Canonical,
Non-Canonical and Non-Literary Texts [0.0]
Three text types (non-literary, literary/canonical and literary/non-canonical) exhibit systematic differences with respect to structural design features as correlates of aesthetic responses in readers.
Two aspects of global structure are investigated, variability and self-similar (fractal) patterns, which reflect long-range correlations along texts.
Our results show that low-level properties of texts are better discriminators than high-level properties, for the three text types under analysis.
arXiv Detail & Related papers (2020-08-25T09:37:06Z) - A computational model implementing subjectivity with the 'Room Theory'.
The case of detecting Emotion from Text [68.8204255655161]
This work introduces a new method to consider subjectivity and general context dependency in text analysis.
By using similarity measure between words, we are able to extract the relative relevance of the elements in the benchmark.
This method could be applied to all the cases where evaluating subjectivity is relevant to understand the relative value or meaning of a text.
arXiv Detail & Related papers (2020-05-12T21:26:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.