Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models
- URL: http://arxiv.org/abs/2508.11499v1
- Date: Fri, 15 Aug 2025 14:20:58 GMT
- Title: Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models
- Authors: Erez Meoded,
- Abstract summary: We apply TrOCR, a state-of-the-art transformer-based HTR model, to 16th-century Latin manuscripts authored by Rudolf Gwalther.<n>We introduce four novel augmentation methods designed specifically for historical handwriting characteristics.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Historical handwritten text recognition (HTR) is essential for unlocking the cultural and scholarly value of archival documents, yet digitization is often hindered by scarce transcriptions, linguistic variation, and highly diverse handwriting styles. In this study, we apply TrOCR, a state-of-the-art transformer-based HTR model, to 16th-century Latin manuscripts authored by Rudolf Gwalther. We investigate targeted image preprocessing and a broad suite of data augmentation techniques, introducing four novel augmentation methods designed specifically for historical handwriting characteristics. We also evaluate ensemble learning approaches to leverage the complementary strengths of augmentation-trained models. On the Gwalther dataset, our best single-model augmentation (Elastic) achieves a Character Error Rate (CER) of 1.86, while a top-5 voting ensemble achieves a CER of 1.60 - representing a 50% relative improvement over the best reported TrOCR_BASE result and a 42% improvement over the previous state of the art. These results highlight the impact of domain-specific augmentations and ensemble strategies in advancing HTR performance for historical manuscripts.
Related papers
- Quo Vadis Handwritten Text Generation for Handwritten Text Recognition? [34.1205194877339]
The digitization of historical manuscripts presents significant challenges for Handwritten Text Recognition (HTR) systems.<n>Handwritten Text Generation (HTG) techniques generate synthetic data tailored to specific handwriting styles.<n>We compare three state-of-the-art styled HTG models to assess their impact on HTR fine-tuning.
arXiv Detail & Related papers (2025-08-13T16:39:18Z) - Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques [4.5220419118352915]
This paper presents a survey of offline handwritten data augmentation and generation techniques.<n>We examine traditional augmentation methods alongside recent advances in deep learning.<n>We explore the challenges associated with generating diverse and realistic handwriting samples.
arXiv Detail & Related papers (2025-07-08T12:03:58Z) - HTR-JAND: Handwritten Text Recognition with Joint Attention Network and Knowledge Distillation [21.25786478579275]
Current Handwritten Text Recognition (HTR) systems struggle with the inherent complexity of historical documents.<n>This paper introduces HTR-JAND, an efficient HTR framework that combines advanced feature extraction with knowledge distillation.<n>We enhance recognition accuracy through context-aware T5 post-processing, particularly effective for historical documents.
arXiv Detail & Related papers (2024-12-24T16:08:24Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.<n>We introduce novel methodologies and datasets to overcome these challenges.<n>We propose MhBART, an encoder-decoder model designed to emulate human writing style.<n>We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach [53.189911918976655]
We propose DOLPHIN, a novel retrieval model designed to enhance handwriting representations through synergistic temporal-frequency analysis.<n>We introduce OLIWER, a large-scale online writer retrieval dataset encompassing over 670,000 Chinese handwritten phrases from 1,731 individuals.<n>Our findings emphasize the significance of point sampling frequency and pressure features in improving handwriting representation quality.
arXiv Detail & Related papers (2024-12-16T11:19:22Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback [69.4639239117551]
FigCaps-HF is a new framework for figure-caption generation that incorporates domain expert feedback in generating captions optimized for reader preferences.<n>Our framework comprises of 1) an automatic method for evaluating quality of figure-caption pairs, 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences.
arXiv Detail & Related papers (2023-07-20T13:40:22Z) - How to Choose Pretrained Handwriting Recognition Models for Single
Writer Fine-Tuning [23.274139396706264]
Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on modern and historical manuscripts.
Those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting.
In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model.
We give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines
arXiv Detail & Related papers (2023-05-04T07:00:28Z) - The Challenges of HTR Model Training: Feedback from the Project Donner
le gout de l'archive a l'ere numerique [0.0]
This article reports on the impacts of creating transcribing protocols and using the language model at full scale.
It also determines the best way to use base models in order to help increase the performance of handwritten text recognition models.
arXiv Detail & Related papers (2022-12-13T12:42:12Z) - PART: Pre-trained Authorship Representation Transformer [52.623051272843426]
Authors writing documents imprint identifying information within their texts.<n>Previous works use hand-crafted features or classification tasks to train their authorship models.<n>We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Boosting Modern and Historical Handwritten Text Recognition with
Deformable Convolutions [52.250269529057014]
Handwritten Text Recognition (HTR) in free-volution pages is a challenging image understanding task.
We propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text.
arXiv Detail & Related papers (2022-08-17T06:55:54Z) - SmartPatch: Improving Handwritten Word Imitation with Patch
Discriminators [67.54204685189255]
We propose SmartPatch, a new technique increasing the performance of current state-of-the-art methods.
We combine the well-known patch loss with information gathered from the parallel trained handwritten text recognition system.
This leads to a more enhanced local discriminator and results in more realistic and higher-quality generated handwritten words.
arXiv Detail & Related papers (2021-05-21T18:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.