Related papers: Critical biblical studies via word frequency analysis: unveiling text authorship

Critical biblical studies via word frequency analysis: unveiling text authorship

URL: http://arxiv.org/abs/2410.19883v1
Date: Thu, 24 Oct 2024 22:08:38 GMT
Title: Critical biblical studies via word frequency analysis: unveiling text authorship
Authors: Shira Faigenbaum-Golovin, Alon Kipnis, Axel Bühler, Eli Piasetzky, Thomas Römer, Israel Finkelstein,
Abstract summary: We aim to differentiate between three distinct authors across numerous chapters spanning the first nine books of the Bible. Our analysis indicates that the first two authors (D and DtrH) are much more closely related compared to P, a fact that aligns with expert assessments.
Score: 7.2762881851201255
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Bible, a product of an extensive and intricate process of oral-written transmission spanning centuries, obscures the contours of its earlier recensions. Debate rages over determining the existing layers and identifying the date of composition and historical background of the biblical texts. Traditional manual methodologies have grappled with authorship challenges through scrupulous textual criticism, employing linguistic, stylistic, inner-biblical, and historical criteria. Despite recent progress in computer-assisted analysis, many patterns still need to be uncovered in Biblical Texts. In this study, we address the question of authorship of biblical texts by employing statistical analysis to the frequency of words using a method that is particularly sensitive to deviations in frequencies associated with a few words out of potentially many. We aim to differentiate between three distinct authors across numerous chapters spanning the first nine books of the Bible. In particular, we examine 50 chapters labeled according to biblical exegesis considerations into three corpora (D, DtrH, and P). Without prior assumptions about author identity, our approach leverages subtle differences in word frequencies to distinguish among the three corpora and identify author-dependent linguistic properties. Our analysis indicates that the first two authors (D and DtrH) are much more closely related compared to P, a fact that aligns with expert assessments. Additionally, we attain high accuracy in attributing authorship by evaluating the similarity of each chapter with the reference corpora. This study sheds new light on the authorship of biblical texts by providing interpretable, statistically significant evidence that there are different linguistic characteristics of biblical authors and that these differences can be identified.

Related papers

Targum -- A Multilingual New Testament Translation Corpus [46.390064640459]
We introduce a multilingual corpus of 657 New Testament translations, of which 352 are unique, with unprecedented depth in five languages: English (208 unique versions from 396 total), French (41 from 78), Italian (18 from 33), Polish (30 from 48), and Spanish (55 from 102)<n>Each translation is manually annotated with metadata that maps the text to a standardized identifier for the work, its specific edition, and its year of revision.<n>This canonicalization empowers researchers to define "uniqueness" for their own needs.
arXiv Detail & Related papers (2026-02-10T12:27:57Z)
Computational Analysis of Character Development in Holocaust Testimonies [13.639727580099484]
This work presents a computational approach to analyze character development along the narrative timeline. We consider transcripts of Holocaust survivor testimonies as a test case, each telling the story of an individual in first-person terms. We focus on the survivor's religious trajectory, examining the evolution of their disposition toward religious belief and practice.
arXiv Detail & Related papers (2024-12-22T15:20:53Z)
A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document. Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative. Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z)
Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases. Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z)
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [55.33653554387953]
Pattern Analysis and Machine Intelligence (PAMI) has led to numerous literature reviews aimed at collecting and fragmented information. This paper presents a thorough analysis of these literature reviews within the PAMI field. We try to address three core research questions: (1) What are the prevalent structural and statistical characteristics of PAMI literature reviews; (2) What strategies can researchers employ to efficiently navigate the growing corpus of reviews; and (3) What are the advantages and limitations of AI-generated reviews compared to human-authored ones.
arXiv Detail & Related papers (2024-02-20T11:28:50Z)
Large language model for Bible sentiment analysis: Sermon on the Mount [1.8804426519412474]
We use sentiment analysis for studying selected chapters of the Bible. These chapters are known as the Sermon on the Mount. We detect different levels of humour, optimism, and empathy in the respective chapters that were used by Jesus to deliver his message.
arXiv Detail & Related papers (2024-01-01T07:35:29Z)
Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves. We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features. Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z)
A Statistical Exploration of Text Partition Into Constituents: The Case of the Priestly Source in the Books of Genesis and Exodus [1.8780017602640042]
We present a pipeline for a statistical textual exploration, offering a stylometry-based explanation and statistical validation of a hypothesized partition of a text. We apply our pipeline to the first two books in the Bible, where one stylistic component stands out in the eyes of biblical scholars, namely, the Priestly component.
arXiv Detail & Related papers (2023-05-03T15:07:42Z)
PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage. Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors. We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z)
Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution [74.27826764855911]
We employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
arXiv Detail & Related papers (2021-10-27T06:25:31Z)
Sensing Ambiguity in Henry James' "The Turn of the Screw" [0.8528384027684192]
This work brings together computational text analysis and literary analysis to demonstrate the extent to which ambiguity in certain texts plays a key role in shaping meaning. We revisit the discussion, well known in the humanities, about the role ambiguity plays in Henry James' 19th century novella, The Turn of the Screw. We demonstrate that cosine similarity and word mover's distance are sensitive enough to detect ambiguity in its most subtle literary form.
arXiv Detail & Related papers (2020-11-21T17:53:41Z)
Artificial intelligence based writer identification generates new evidence for the unknown scribes of the Dead Sea Scrolls exemplified by the Great Isaiah Scroll (1QIsaa) [5.285396202883411]
We use pattern recognition and artificial intelligence techniques to innovate the palaeography of the scrolls regarding writer identification. Although many scholars believe that 1QIsaa was written by one scribe, we report new evidence for a breaking point in the series of columns in this scroll. This study sheds new light on the Bible's ancient scribal culture by providing new, tangible evidence that ancient biblical texts were not copied by a single scribe only but that multiple scribes could closely collaborate on one particular manuscript.
arXiv Detail & Related papers (2020-10-27T17:36:18Z)
Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge [62.46091695615262]
We aim to extract commonsense knowledge to improve machine reading comprehension. We propose to represent relations implicitly by situating structured knowledge in a context. We employ a teacher-student paradigm to inject multiple types of contextualized knowledge into a student machine reader.
arXiv Detail & Related papers (2020-09-12T17:20:01Z)
Comparative Computational Analysis of Global Structure in Canonical, Non-Canonical and Non-Literary Texts [0.0]
Three text types (non-literary, literary/canonical and literary/non-canonical) exhibit systematic differences with respect to structural design features as correlates of aesthetic responses in readers. Two aspects of global structure are investigated, variability and self-similar (fractal) patterns, which reflect long-range correlations along texts. Our results show that low-level properties of texts are better discriminators than high-level properties, for the three text types under analysis.
arXiv Detail & Related papers (2020-08-25T09:37:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.