Quo Vadis Handwritten Text Generation for Handwritten Text Recognition?
- URL: http://arxiv.org/abs/2508.09936v1
- Date: Wed, 13 Aug 2025 16:39:18 GMT
- Title: Quo Vadis Handwritten Text Generation for Handwritten Text Recognition?
- Authors: Vittorio Pippi, Konstantina Nikolaidou, Silvia Cascianelli, George Retsinas, Giorgos Sfikas, Rita Cucchiara, Marcus Liwicki,
- Abstract summary: The digitization of historical manuscripts presents significant challenges for Handwritten Text Recognition (HTR) systems.<n>Handwritten Text Generation (HTG) techniques generate synthetic data tailored to specific handwriting styles.<n>We compare three state-of-the-art styled HTG models to assess their impact on HTR fine-tuning.
- Score: 34.1205194877339
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The digitization of historical manuscripts presents significant challenges for Handwritten Text Recognition (HTR) systems, particularly when dealing with small, author-specific collections that diverge from the training data distributions. Handwritten Text Generation (HTG) techniques, which generate synthetic data tailored to specific handwriting styles, offer a promising solution to address these challenges. However, the effectiveness of various HTG models in enhancing HTR performance, especially in low-resource transcription settings, has not been thoroughly evaluated. In this work, we systematically compare three state-of-the-art styled HTG models (representing the generative adversarial, diffusion, and autoregressive paradigms for HTG) to assess their impact on HTR fine-tuning. We analyze how visual and linguistic characteristics of synthetic data influence fine-tuning outcomes and provide quantitative guidelines for selecting the most effective HTG model. The results of our analysis provide insights into the current capabilities of HTG methods and highlight key areas for further improvement in their application to low-resource HTR.
Related papers
- Autoregressive Styled Text Image Generation, but Make it Reliable [51.09340470015673]
This work is dedicated to developing strategies that reproduce the characteristics of a given writer, with promising results in terms of style fidelity and generalization achieved by the recently proposed Autoregressive Transformer paradigm for HTG.<n>In this work, we rethink the autoregressive by framing HTG as a multimodal prompt-conditioned generation task, tackling the content controllability issues by introducing special input tokens for better alignment with the visual ones.
arXiv Detail & Related papers (2025-10-27T11:54:23Z) - Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models [0.0]
We apply TrOCR, a state-of-the-art transformer-based HTR model, to 16th-century Latin manuscripts authored by Rudolf Gwalther.<n>We introduce four novel augmentation methods designed specifically for historical handwriting characteristics.
arXiv Detail & Related papers (2025-08-15T14:20:58Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.<n>We introduce novel methodologies and datasets to overcome these challenges.<n>We propose MhBART, an encoder-decoder model designed to emulate human writing style.<n>We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach [53.189911918976655]
We propose DOLPHIN, a novel retrieval model designed to enhance handwriting representations through synergistic temporal-frequency analysis.<n>We introduce OLIWER, a large-scale online writer retrieval dataset encompassing over 670,000 Chinese handwritten phrases from 1,731 individuals.<n>Our findings emphasize the significance of point sampling frequency and pressure features in improving handwriting representation quality.
arXiv Detail & Related papers (2024-12-16T11:19:22Z) - DiffusionPen: Towards Controlling the Style of Handwritten Text Generation [7.398476020996681]
DiffusionPen (DiffPen) is a 5-shot style handwritten text generation approach based on Latent Diffusion Models.
Our approach captures both textual and stylistic characteristics of seen and unseen words and styles, generating realistic handwritten samples.
Our method outperforms existing methods qualitatively and quantitatively, and its additional generated data can improve the performance of Handwriting Text Recognition (HTR) systems.
arXiv Detail & Related papers (2024-09-09T20:58:25Z) - Rethinking HTG Evaluation: Bridging Generation and Recognition [7.398476020996681]
We introduce three measures tailored for HTG evaluation, $ textHTG_textstyle $, and $ textHTG_textOOV $.
The metrics rely on the recognition error/accuracy of Handwriting Text Recognition and Writer Identification models.
Our findings show that our metrics are richer in information and underscore the necessity of standardized evaluation protocols in HTG.
arXiv Detail & Related papers (2024-09-04T13:15:10Z) - Improving Sampling Methods for Fine-tuning SentenceBERT in Text Streams [49.3179290313959]
This study explores the efficacy of seven text sampling methods designed to selectively fine-tune language models.
We precisely assess the impact of these methods on fine-tuning the SBERT model using four different loss functions.
Our findings indicate that Softmax loss and Batch All Triplets loss are particularly effective for text stream classification.
arXiv Detail & Related papers (2024-03-18T23:41:52Z) - HWD: A Novel Evaluation Score for Styled Handwritten Text Generation [36.416044687373535]
Styled Handwritten Text Generation (Styled HTG) is an important task in document analysis, aiming to generate text images with the handwriting of given reference images.
We devise the Handwriting Distance (HWD), tailored for HTG evaluation.
In particular, it works in the feature space of a network specifically trained to extract handwriting style features from the variable-lenght input images and exploits a perceptual distance to compare the subtle geometric features of handwriting.
arXiv Detail & Related papers (2023-10-31T09:44:27Z) - How to Choose Pretrained Handwriting Recognition Models for Single
Writer Fine-Tuning [23.274139396706264]
Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on modern and historical manuscripts.
Those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting.
In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model.
We give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines
arXiv Detail & Related papers (2023-05-04T07:00:28Z) - A Study of Augmentation Methods for Handwritten Stenography Recognition [0.0]
We study 22 classical augmentation techniques, most of which are commonly used for HTR of other scripts.
We identify a group of augmentations, including for example contained ranges of random rotation, shifts and scaling, that are beneficial to the use case of stenography recognition.
arXiv Detail & Related papers (2023-03-05T20:06:19Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z) - Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA)
We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets.
The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.