Related papers: A Statistical Exploration of Text Partition Into Constituents: The Case of the Priestly Source in the Books of Genesis and Exodus

A Statistical Exploration of Text Partition Into Constituents: The Case of the Priestly Source in the Books of Genesis and Exodus

URL: http://arxiv.org/abs/2305.02170v3
Date: Sat, 10 Jun 2023 07:57:22 GMT
Title: A Statistical Exploration of Text Partition Into Constituents: The Case of the Priestly Source in the Books of Genesis and Exodus
Authors: Gideon Yoffe and Axel B\"uhler and Nachum Dershowitz and Israel Finkelstein and Eli Piasetzky and Thomas R\"omer and Barak Sober
Abstract summary: We present a pipeline for a statistical textual exploration, offering a stylometry-based explanation and statistical validation of a hypothesized partition of a text. We apply our pipeline to the first two books in the Bible, where one stylistic component stands out in the eyes of biblical scholars, namely, the Priestly component.
Score: 1.8780017602640042
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a pipeline for a statistical textual exploration, offering a stylometry-based explanation and statistical validation of a hypothesized partition of a text. Given a parameterization of the text, our pipeline: (1) detects literary features yielding the optimal overlap between the hypothesized and unsupervised partitions, (2) performs a hypothesis-testing analysis to quantify the statistical significance of the optimal overlap, while conserving implicit correlations between units of text that are more likely to be grouped, and (3) extracts and quantifies the importance of features most responsible for the classification, estimates their statistical stability and cluster-wise abundance. We apply our pipeline to the first two books in the Bible, where one stylistic component stands out in the eyes of biblical scholars, namely, the Priestly component. We identify and explore statistically significant stylistic differences between the Priestly and non-Priestly components.

Related papers

Estimating the Influence of Sequentially Correlated Literary Properties in Textual Classification: A Data-Centric Hypothesis-Testing Approach [4.161155428666988]
Stylometry aims to distinguish authors by analyzing literary traits assumed to reflect semi-conscious choices distinct from elements like genre or theme. While some literary properties, such as thematic content, are likely to manifest as correlations between adjacent text units, others, like authorial style, may be independent thereof. We introduce a hypothesis-testing approach to evaluate the influence of sequentially correlated literary properties on text classification.
arXiv Detail & Related papers (2024-11-07T18:28:40Z)
Critical biblical studies via word frequency analysis: unveiling text authorship [7.2762881851201255]
We aim to differentiate between three distinct authors across numerous chapters spanning the first nine books of the Bible. Our analysis indicates that the first two authors (D and DtrH) are much more closely related compared to P, a fact that aligns with expert assessments.
arXiv Detail & Related papers (2024-10-24T22:08:38Z)
How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored. Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges. We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z)
Statistical Depth for Ranking and Characterizing Transformer-Based Text Embeddings [1.321681963474017]
A statistical depth is a function for ranking k-dimensional objects by measuring centrality with respect to some observed k-dimensional distribution. We adopt a statistical depth to measure distributions of transformer-based text embeddings, transformer-based text embedding (TTE) depth, and introduce the practical use of this depth for both modeling and distributional inference in NLP pipelines.
arXiv Detail & Related papers (2023-10-23T15:02:44Z)
PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually. We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z)
Textual Entailment Recognition with Semantic Features from Empirical Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text. In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis. We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z)
Comprehensive Studies for Arbitrary-shape Scene Text Detection [78.50639779134944]
We propose a unified framework for the bottom-up based scene text detection methods. Under the unified framework, we ensure the consistent settings for non-core modules. With the comprehensive investigations and elaborate analyses, it reveals the advantages and disadvantages of previous models.
arXiv Detail & Related papers (2021-07-25T13:18:55Z)
On the Interpretability and Significance of Bias Metrics in Texts: a PMI-based Approach [3.2326259807823026]
We analyze an alternative PMI-based metric to quantify biases in texts. It can be expressed as a function of conditional probabilities, which provides a simple interpretation in terms of word co-occurrences.
arXiv Detail & Related papers (2021-04-13T19:34:17Z)
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech [16.45773135100367]
We introduce Kathaka, a model trained with a novel two-stage training process for neural speech synthesis. We learn a prosodic distribution at the sentence level from mel-spectrograms available during training. In Stage II, we propose a novel method to sample from this learnt prosodic distribution using the contextual information available in text.
arXiv Detail & Related papers (2020-11-04T12:20:21Z)
Pareto Probing: Trading Off Accuracy for Complexity [87.09294772742737]
We argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance. Our experiments with dependency parsing reveal a wide gap in syntactic knowledge between contextual and non-contextual representations.
arXiv Detail & Related papers (2020-10-05T17:27:31Z)
Comparative Computational Analysis of Global Structure in Canonical, Non-Canonical and Non-Literary Texts [0.0]
Three text types (non-literary, literary/canonical and literary/non-canonical) exhibit systematic differences with respect to structural design features as correlates of aesthetic responses in readers. Two aspects of global structure are investigated, variability and self-similar (fractal) patterns, which reflect long-range correlations along texts. Our results show that low-level properties of texts are better discriminators than high-level properties, for the three text types under analysis.
arXiv Detail & Related papers (2020-08-25T09:37:06Z)
A computational model implementing subjectivity with the 'Room Theory'. The case of detecting Emotion from Text [68.8204255655161]
This work introduces a new method to consider subjectivity and general context dependency in text analysis. By using similarity measure between words, we are able to extract the relative relevance of the elements in the benchmark. This method could be applied to all the cases where evaluating subjectivity is relevant to understand the relative value or meaning of a text.
arXiv Detail & Related papers (2020-05-12T21:26:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.