Dating ancient manuscripts using radiocarbon and AI-based writing style analysis
- URL: http://arxiv.org/abs/2407.12013v2
- Date: Fri, 18 Oct 2024 08:57:30 GMT
- Title: Dating ancient manuscripts using radiocarbon and AI-based writing style analysis
- Authors: Mladen Popović, Maruf A. Dhali, Lambert Schomaker, Johannes van der Plicht, Kaare Lund Rasmussen, Jacopo La Nasa, Ilaria Degano, Maria Perla Colombini, Eibert Tigchelaar,
- Abstract summary: We present Enoch, an AI-based date-prediction model, trained on the basis of new radiocarbon-dated samples of the Dead Sea Scrolls.
Enoch could predict the radiocarbon-based dates from style, supported by leave-one-out validation, with varied MAEs of 27.9 to 30.7 years relative to the radiocarbon dating.
It was then used to estimate the dates of 135 unseen manuscripts, revealing that 79 per cent of the samples were considered'realistic' upon palaeographic post-hoc evaluation.
- Score: 1.828413418929518
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Determining the chronology of ancient handwritten manuscripts is essential for reconstructing the evolution of ideas. For the Dead Sea Scrolls, this is particularly important. However, there is an almost complete lack of date-bearing manuscripts evenly distributed across the timeline and written in similar scripts available for palaeographic comparison. Here, we present Enoch, a state-of-the-art AI-based date-prediction model, trained on the basis of new radiocarbon-dated samples of the scrolls. Enoch uses established handwriting-style descriptors and applies Bayesian ridge regression. The challenge of this study is that the number of radiocarbon-dated manuscripts is small, while current machine learning requires an abundance of training data. We show that by using combined angular and allographic writing style feature vectors and applying Bayesian ridge regression, Enoch could predict the radiocarbon-based dates from style, supported by leave-one-out validation, with varied MAEs of 27.9 to 30.7 years relative to the radiocarbon dating. Enoch was then used to estimate the dates of 135 unseen manuscripts, revealing that 79 per cent of the samples were considered 'realistic' upon palaeographic post-hoc evaluation. We present a new chronology of the scrolls. The radiocarbon ranges and Enoch's style-based predictions are often older than the traditionally assumed palaeographic estimates. In the range of 300-50 BCE, Enoch's date prediction provides an improved granularity. The study is in line with current developments in multimodal machine-learning techniques, and the methods can be used for date prediction in other partially-dated manuscript collections. This research shows how Enoch's quantitative, probability-based approach can be a tool for palaeographers and historians, re-dating ancient Jewish key texts and contributing to current debates on Jewish and Christian origins.
Related papers
- A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - A ripple in time: a discontinuity in American history [49.84018914962972]
We suggest a novel approach to discover temporal (related and unrelated to language dilation) and personality (authorship attribution) aspects in historical datasets.
We exemplify our approach on the State of the Union addresses given by the past 42 US presidents.
arXiv Detail & Related papers (2023-12-02T17:24:17Z) - PHD: Pixel-Based Language Modeling of Historical Documents [55.75201940642297]
We propose a novel method for generating synthetic scans to resemble real historical documents.
We pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period.
We successfully apply our model to a historical QA task, highlighting its usefulness in this domain.
arXiv Detail & Related papers (2023-10-22T08:45:48Z) - Blind Dates: Examining the Expression of Temporality in Historical
Photographs [57.07335632641355]
We investigate the dating of images using OpenCLIP, an open-source implementation of CLIP, a multi-modal language and vision model.
We use the textitDe Boer Scene Detection dataset, containing 39,866 gray-scale historical press photographs from 1950 to 1999.
Our analysis reveals that images featuring buses, cars, cats, dogs, and people are more accurately dated, suggesting the presence of temporal markers.
arXiv Detail & Related papers (2023-10-10T13:51:24Z) - Look-back Decoding for Open-Ended Text Generation [62.53302138266465]
We propose Look-back, an improved decoding algorithm that tracks the distribution distance between current and historical decoding steps.
Look-back can automatically predict potential repetitive phrase and topic drift, and remove tokens that may cause the failure modes.
We perform decoding experiments on document continuation and story generation, and demonstrate that Look-back is able to generate more fluent and coherent text.
arXiv Detail & Related papers (2023-05-22T20:42:37Z) - Quran Recitation Recognition using End-to-End Deep Learning [0.0]
The Quran is the holy scripture of Islam, and its recitation is an important aspect of the religion.
Recognizing the recitation of the Holy Quran automatically is a challenging task due to its unique rules.
We propose a novel end-to-end deep learning model for recognizing the recitation of the Holy Quran.
arXiv Detail & Related papers (2023-05-10T18:40:01Z) - BERT-based Authorship Attribution on the Romanian Dataset called ROST [0.0]
We use a model to detect the authorship of texts written in the Romanian language.
The dataset used is highly unbalanced, i.e., significant differences in the number of texts per author.
Results are better than expected, sometimes exceeding 87% macro-accuracy.
arXiv Detail & Related papers (2023-01-29T17:37:29Z) - The Effects of Character-Level Data Augmentation on Style-Based Dating
of Historical Manuscripts [5.285396202883411]
This article explores the influence of data augmentation on the dating of historical manuscripts.
Linear Support Vector Machines were trained with k-fold cross-validation on textural and grapheme-based features extracted from historical manuscripts.
Results show that training models with augmented data improve the performance of historical manuscripts dating by 1% - 3% in cumulative scores.
arXiv Detail & Related papers (2022-12-15T15:55:44Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Multiple regression techniques for modeling dates of first performances
of Shakespeare-era plays [2.1827922098806214]
We took a set of Shakespeare-era plays (181 plays from the period 1585--1610) and added the best-guess dates for them from a standard reference work as metadata.
We applied 11 regression methods to predict the dates of the plays at an 80/20 training/test split.
An in-depth analysis of the most commonly occurring 20 words in the models in 100 independent runs helps explain the trends in linguistic and stylistic terms.
arXiv Detail & Related papers (2021-04-13T04:13:53Z) - Artificial intelligence based writer identification generates new
evidence for the unknown scribes of the Dead Sea Scrolls exemplified by the
Great Isaiah Scroll (1QIsaa) [5.285396202883411]
We use pattern recognition and artificial intelligence techniques to innovate the palaeography of the scrolls regarding writer identification.
Although many scholars believe that 1QIsaa was written by one scribe, we report new evidence for a breaking point in the series of columns in this scroll.
This study sheds new light on the Bible's ancient scribal culture by providing new, tangible evidence that ancient biblical texts were not copied by a single scribe only but that multiple scribes could closely collaborate on one particular manuscript.
arXiv Detail & Related papers (2020-10-27T17:36:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.