Making Characters Count. A Computational Approach to Scribal Profiling in 14th-Century Middle Dutch Manuscripts from the Carthusian Monastery of Herne
- URL: http://arxiv.org/abs/2509.00067v1
- Date: Tue, 26 Aug 2025 08:20:40 GMT
- Title: Making Characters Count. A Computational Approach to Scribal Profiling in 14th-Century Middle Dutch Manuscripts from the Carthusian Monastery of Herne
- Authors: Caroline Vandyck, Wouter Haverals, Mike Kestemont,
- Abstract summary: The Carthusian monastery of Herne was exceptionally prolific in producing high-quality manuscripts during the late 14th century.<n>Previous research has distinguished thirteen different scribal hands based on paleography and codicology.<n>We revisit this hypothesis through the lens of linguistic characteristics of the texts, using computational methods from the field of scribal profiling.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The Carthusian monastery of Herne was exceptionally prolific in producing high-quality manuscripts during the late 14th century. Although the scribes remain anonymous, previous research has distinguished thirteen different scribal hands based on paleography and codicology. In this study, we revisit this hypothesis through the lens of linguistic characteristics of the texts, using computational methods from the field of scribal profiling. Using a newly created corpus of diplomatic and HTR-based transcriptions, we analyze abbreviation practices across the Herne scribes and demonstrate that abbreviation density provides a distinctive metric for differentiating scribal hands. In combination with a stylometric bag-of-characters model with brevigraph features, this approach corroborates and refines earlier hypotheses about scribal attribution, including evidence that challenges the role of scribe $\alpha$ in Vienna, \"{O}NB, SN 65. Our results highlight the value of combining computational stylometry with traditional codicology, showing how even the smallest elements of the written system -- characters and abbreviations -- can reveal patterns of scribal identity, collaboration, and manuscript transmission.
Related papers
- Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach [53.189911918976655]
We propose DOLPHIN, a novel retrieval model designed to enhance handwriting representations through synergistic temporal-frequency analysis.<n>We introduce OLIWER, a large-scale online writer retrieval dataset encompassing over 670,000 Chinese handwritten phrases from 1,731 individuals.<n>Our findings emphasize the significance of point sampling frequency and pressure features in improving handwriting representation quality.
arXiv Detail & Related papers (2024-12-16T11:19:22Z) - Enhancement of text recognition for hanja handwritten documents of Ancient Korea [0.769672852567215]
We implement a high-performance optical character recognition model for classical handwritten documents.<n>The recognition of hanja handwritten documents is a meaningful and special challenge.
arXiv Detail & Related papers (2024-12-14T02:29:07Z) - Recognizing Handwriting Styles in a Historical Scanned Document Using
Unsupervised Fuzzy Clustering [0.0]
Unique handwriting styles may be dissimilar in a blend of several factors including character size, stroke width, loops, ductus, slant angles, and cursive ligatures.
Previous work on labeled data with Hidden Markov models, support vector machines, and semi-supervised recurrent neural networks have provided moderate to high success.
In this study, we successfully detect hand shifts in a historical manuscript through fuzzy soft clustering in combination with linear principal component analysis.
arXiv Detail & Related papers (2022-10-30T09:07:51Z) - PART: Pre-trained Authorship Representation Transformer [52.623051272843426]
Authors writing documents imprint identifying information within their texts.<n>Previous works use hand-crafted features or classification tasks to train their authorship models.<n>We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Digital Editions as Distant Supervision for Layout Analysis of Printed
Books [76.29918490722902]
We describe methods for exploiting this semantic markup as distant supervision for training and evaluating layout analysis models.
In experiments with several model architectures on the half-million pages of the Deutsches Textarchiv (DTA), we find a high correlation of these region-level evaluation methods with pixel-level and word-level metrics.
We discuss the possibilities for improving accuracy with self-training and the ability of models trained on the DTA to generalize to other historical printed books.
arXiv Detail & Related papers (2021-12-23T16:51:53Z) - Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues.
A main challenge is that a person often writes a letter in different styles from time to time.
We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z) - Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship
Attribution [74.27826764855911]
We employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts.
Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
arXiv Detail & Related papers (2021-10-27T06:25:31Z) - Image Collation: Matching illustrations in manuscripts [76.21388548732284]
We introduce the task of illustration collation and a large annotated public dataset to evaluate solutions.
We analyze state of the art similarity measures for this task and show that they succeed in simple cases but struggle for large manuscripts.
We show clear evidence that significant performance boosts can be expected by exploiting cycle-consistent correspondences.
arXiv Detail & Related papers (2021-08-18T12:12:14Z) - Stylometry for Noisy Medieval Data: Evaluating Paul Meyer's Hagiographic
Hypothesis [0.0]
We use a workflow combining handwritten text recognition and stylometric analysis, applied to the case of the hagiographic works contained in MS BnF, fr. 412.
We seek to evaluate Paul Meyer's hypothesis about the constitution of groups of hagiographic works, as well as to examine potential authorial groupings in a vastly anonymous corpus.
arXiv Detail & Related papers (2020-12-07T16:48:34Z) - Spectral Graph-based Features for Recognition of Handwritten Characters:
A Case Study on Handwritten Devanagari Numerals [0.0]
We propose an approach that exploits the robust graph representation and spectral graph embedding concept to represent handwritten characters.
For corroboration of the efficacy of the proposed method, extensive experiments were carried out on the standard handwritten numeral Computer Vision Pattern Recognition, Unit of Indian Statistical Institute Kolkata dataset.
arXiv Detail & Related papers (2020-07-07T08:40:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.