Character Detection using YOLO for Writer Identification in multiple Medieval books
- URL: http://arxiv.org/abs/2601.04834v1
- Date: Thu, 08 Jan 2026 11:11:24 GMT
- Title: Character Detection using YOLO for Writer Identification in multiple Medieval books
- Authors: Alessandra Scotto di Freca, Tiziana D Alessandro, Francesco Fontanella, Filippo Sarria, Claudio De Stefano,
- Abstract summary: Estimating when a document was written and tracing the development of scripts and writing styles can be aided by identifying the individual scribes who contributed to a medieval manuscript.<n>We previously proposed an approach focused on identifying specific letters or abbreviations that characterize each writer.<n>We used template matching techniques to detect the occurrences of the character "a" on each page and the convolutional neural network (CNN) to attribute each instance to the correct scribe.
- Score: 37.5324866770459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Paleography is the study of ancient and historical handwriting, its key objectives include the dating of manuscripts and understanding the evolution of writing. Estimating when a document was written and tracing the development of scripts and writing styles can be aided by identifying the individual scribes who contributed to a medieval manuscript. Although digital technologies have made significant progress in this field, the general problem remains unsolved and continues to pose open challenges. ... We previously proposed an approach focused on identifying specific letters or abbreviations that characterize each writer. In that study, we considered the letter "a", as it was widely present on all pages of text and highly distinctive, according to the suggestions of expert paleographers. We used template matching techniques to detect the occurrences of the character "a" on each page and the convolutional neural network (CNN) to attribute each instance to the correct scribe. Moving from the interesting results achieved from this previous system and being aware of the limitations of the template matching technique, which requires an appropriate threshold to work, we decided to experiment in the same framework with the use of the YOLO object detection model to identify the scribe who contributed to the writing of different medieval books. We considered the fifth version of YOLO to implement the YOLO object detection model, which completely substituted the template matching and CNN used in the previous work. The experimental results demonstrate that YOLO effectively extracts a greater number of letters considered, leading to a more accurate second-stage classification. Furthermore, the YOLO confidence score provides a foundation for developing a system that applies a rejection threshold, enabling reliable writer identification even in unseen manuscripts.
Related papers
- Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection [71.59834293521074]
We develop a framework to distinguish between human-authored and machine-generated text.<n>Our method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset.<n>Code, pretrained weights, and demo will be released.
arXiv Detail & Related papers (2025-10-07T08:14:45Z) - PLATTER: A Page-Level Handwritten Text Recognition System for Indic Scripts [20.394597266150534]
We present an end-to-end framework for Page-Level hAndwriTTen TExt Recognition (PLATTER)<n> Secondly, we demonstrate the usage of PLATTER to measure the performance of our language-agnostic HTD model.<n>Finally, we release a Corpus of Handwritten Indic Scripts (CHIPS), a meticulously curated, page-level Indic handwritten OCR dataset.
arXiv Detail & Related papers (2025-02-10T05:50:26Z) - Contrastive Masked Autoencoders for Character-Level Open-Set Writer Identification [25.996617568144675]
This paper introduces Contrastive Masked Auto-Encoders (CMAE) for Character-level Open-Set Writer Identification.<n>We merge Masked Auto-Encoders (MAE) with Contrastive Learning (CL) to simultaneously and respectively capture sequential information and distinguish diverse handwriting styles.<n>Our model achieves state-of-the-art results on the CASIA online handwriting dataset, reaching an impressive precision rate of 89.7%.
arXiv Detail & Related papers (2025-01-21T05:15:10Z) - Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach [53.189911918976655]
We propose DOLPHIN, a novel retrieval model designed to enhance handwriting representations through synergistic temporal-frequency analysis.<n>We introduce OLIWER, a large-scale online writer retrieval dataset encompassing over 670,000 Chinese handwritten phrases from 1,731 individuals.<n>Our findings emphasize the significance of point sampling frequency and pressure features in improving handwriting representation quality.
arXiv Detail & Related papers (2024-12-16T11:19:22Z) - A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - Writer Retrieval and Writer Identification in Greek Papyri [4.44566870214758]
Writer identification refers to the classification of known writers while writer retrieval seeks to find the writer by means of image similarity in a dataset of images.
While automatic writer identification/retrieval methods already provide promising results for many historical document types, papyri data is very challenging due to the fiber structures and severe artifacts.
We investigate several methods and show that a good binarization is key to an improved writer identification in papyri writings.
arXiv Detail & Related papers (2022-12-15T08:42:25Z) - Cloning Ideology and Style using Deep Learning [0.0]
Research focuses on text generation based on the ideology and style of a specific author, and text generation on a topic that was not written by the same author in the past.
Bi-LSTM model is used to make predictions at the character level, during the training corpus of a specific author is used along with the ground truth corpus.
A pre-trained model is used to identify the sentences of ground truth having contradiction with the author's corpus to make our language model inclined.
arXiv Detail & Related papers (2022-10-25T11:37:19Z) - PART: Pre-trained Authorship Representation Transformer [52.623051272843426]
Authors writing documents imprint identifying information within their texts.<n>Previous works use hand-crafted features or classification tasks to train their authorship models.<n>We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues.
A main challenge is that a person often writes a letter in different styles from time to time.
We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.