Critical biblical studies via word frequency analysis: unveiling text authorship
- URL: http://arxiv.org/abs/2410.19883v1
- Date: Thu, 24 Oct 2024 22:08:38 GMT
- Title: Critical biblical studies via word frequency analysis: unveiling text authorship
- Authors: Shira Faigenbaum-Golovin, Alon Kipnis, Axel Bühler, Eli Piasetzky, Thomas Römer, Israel Finkelstein,
- Abstract summary: We aim to differentiate between three distinct authors across numerous chapters spanning the first nine books of the Bible.
Our analysis indicates that the first two authors (D and DtrH) are much more closely related compared to P, a fact that aligns with expert assessments.
- Score: 7.2762881851201255
- License:
- Abstract: The Bible, a product of an extensive and intricate process of oral-written transmission spanning centuries, obscures the contours of its earlier recensions. Debate rages over determining the existing layers and identifying the date of composition and historical background of the biblical texts. Traditional manual methodologies have grappled with authorship challenges through scrupulous textual criticism, employing linguistic, stylistic, inner-biblical, and historical criteria. Despite recent progress in computer-assisted analysis, many patterns still need to be uncovered in Biblical Texts. In this study, we address the question of authorship of biblical texts by employing statistical analysis to the frequency of words using a method that is particularly sensitive to deviations in frequencies associated with a few words out of potentially many. We aim to differentiate between three distinct authors across numerous chapters spanning the first nine books of the Bible. In particular, we examine 50 chapters labeled according to biblical exegesis considerations into three corpora (D, DtrH, and P). Without prior assumptions about author identity, our approach leverages subtle differences in word frequencies to distinguish among the three corpora and identify author-dependent linguistic properties. Our analysis indicates that the first two authors (D and DtrH) are much more closely related compared to P, a fact that aligns with expert assessments. Additionally, we attain high accuracy in attributing authorship by evaluating the similarity of each chapter with the reference corpora. This study sheds new light on the authorship of biblical texts by providing interpretable, statistically significant evidence that there are different linguistic characteristics of biblical authors and that these differences can be identified.
Related papers
- A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases.
Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z) - Large language model for Bible sentiment analysis: Sermon on the Mount [1.8804426519412474]
We use sentiment analysis for studying selected chapters of the Bible.
These chapters are known as the Sermon on the Mount.
We detect different levels of humour, optimism, and empathy in the respective chapters that were used by Jesus to deliver his message.
arXiv Detail & Related papers (2024-01-01T07:35:29Z) - Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - A Statistical Exploration of Text Partition Into Constituents: The Case
of the Priestly Source in the Books of Genesis and Exodus [1.8780017602640042]
We present a pipeline for a statistical textual exploration, offering a stylometry-based explanation and statistical validation of a hypothesized partition of a text.
We apply our pipeline to the first two books in the Bible, where one stylistic component stands out in the eyes of biblical scholars, namely, the Priestly component.
arXiv Detail & Related papers (2023-05-03T15:07:42Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship
Attribution [74.27826764855911]
We employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts.
Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
arXiv Detail & Related papers (2021-10-27T06:25:31Z) - Sensing Ambiguity in Henry James' "The Turn of the Screw" [0.8528384027684192]
This work brings together computational text analysis and literary analysis to demonstrate the extent to which ambiguity in certain texts plays a key role in shaping meaning.
We revisit the discussion, well known in the humanities, about the role ambiguity plays in Henry James' 19th century novella, The Turn of the Screw.
We demonstrate that cosine similarity and word mover's distance are sensitive enough to detect ambiguity in its most subtle literary form.
arXiv Detail & Related papers (2020-11-21T17:53:41Z) - Artificial intelligence based writer identification generates new
evidence for the unknown scribes of the Dead Sea Scrolls exemplified by the
Great Isaiah Scroll (1QIsaa) [5.285396202883411]
We use pattern recognition and artificial intelligence techniques to innovate the palaeography of the scrolls regarding writer identification.
Although many scholars believe that 1QIsaa was written by one scribe, we report new evidence for a breaking point in the series of columns in this scroll.
This study sheds new light on the Bible's ancient scribal culture by providing new, tangible evidence that ancient biblical texts were not copied by a single scribe only but that multiple scribes could closely collaborate on one particular manuscript.
arXiv Detail & Related papers (2020-10-27T17:36:18Z) - Improving Machine Reading Comprehension with Contextualized Commonsense
Knowledge [62.46091695615262]
We aim to extract commonsense knowledge to improve machine reading comprehension.
We propose to represent relations implicitly by situating structured knowledge in a context.
We employ a teacher-student paradigm to inject multiple types of contextualized knowledge into a student machine reader.
arXiv Detail & Related papers (2020-09-12T17:20:01Z) - Comparative Computational Analysis of Global Structure in Canonical,
Non-Canonical and Non-Literary Texts [0.0]
Three text types (non-literary, literary/canonical and literary/non-canonical) exhibit systematic differences with respect to structural design features as correlates of aesthetic responses in readers.
Two aspects of global structure are investigated, variability and self-similar (fractal) patterns, which reflect long-range correlations along texts.
Our results show that low-level properties of texts are better discriminators than high-level properties, for the three text types under analysis.
arXiv Detail & Related papers (2020-08-25T09:37:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.