How to Choose Pretrained Handwriting Recognition Models for Single
Writer Fine-Tuning
- URL: http://arxiv.org/abs/2305.02593v1
- Date: Thu, 4 May 2023 07:00:28 GMT
- Title: How to Choose Pretrained Handwriting Recognition Models for Single
Writer Fine-Tuning
- Authors: Vittorio Pippi, Silvia Cascianelli, Christopher Kermorvant, Rita
Cucchiara
- Abstract summary: Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on modern and historical manuscripts.
Those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting.
In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model.
We give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines
- Score: 23.274139396706264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR)
have led to models with remarkable performance on both modern and historical
manuscripts in large benchmark datasets. Nonetheless, those models struggle to
obtain the same performance when applied to manuscripts with peculiar
characteristics, such as language, paper support, ink, and author handwriting.
This issue is very relevant for valuable but small collections of documents
preserved in historical archives, for which obtaining sufficient annotated
training data is costly or, in some cases, unfeasible. To overcome this
challenge, a possible solution is to pretrain HTR models on large datasets and
then fine-tune them on small single-author collections. In this paper, we take
into account large, real benchmark datasets and synthetic ones obtained with a
styled Handwritten Text Generation model. Through extensive experimental
analysis, also considering the amount of fine-tuning lines, we give a
quantitative indication of the most relevant characteristics of such data for
obtaining an HTR model able to effectively transcribe manuscripts in small
collections with as little as five real fine-tuning lines.
Related papers
- LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers.
LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs.
We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z) - Detection and Measurement of Syntactic Templates in Generated Text [58.111650675717414]
We offer an analysis of syntactic features to characterize general repetition in models.
We find that models tend to produce templated text in downstream tasks at a higher rate than what is found in human-reference texts.
arXiv Detail & Related papers (2024-06-28T19:34:23Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - Self-Supervised Representation Learning for Online Handwriting Text
Classification [0.8594140167290099]
We propose the novel Part of Stroke Masking (POSM) as a pretext task for pretraining models to extract informative representations from the online handwriting of individuals in English and Chinese languages.
To evaluate the quality of the extracted representations, we use both intrinsic and extrinsic evaluation methods.
The pretrained models are fine-tuned to achieve state-of-the-art results in tasks such as writer identification, gender classification, and handedness classification.
arXiv Detail & Related papers (2023-10-10T14:07:49Z) - The Effects of Character-Level Data Augmentation on Style-Based Dating
of Historical Manuscripts [5.285396202883411]
This article explores the influence of data augmentation on the dating of historical manuscripts.
Linear Support Vector Machines were trained with k-fold cross-validation on textural and grapheme-based features extracted from historical manuscripts.
Results show that training models with augmented data improve the performance of historical manuscripts dating by 1% - 3% in cumulative scores.
arXiv Detail & Related papers (2022-12-15T15:55:44Z) - Recognizing Handwriting Styles in a Historical Scanned Document Using
Unsupervised Fuzzy Clustering [0.0]
Unique handwriting styles may be dissimilar in a blend of several factors including character size, stroke width, loops, ductus, slant angles, and cursive ligatures.
Previous work on labeled data with Hidden Markov models, support vector machines, and semi-supervised recurrent neural networks have provided moderate to high success.
In this study, we successfully detect hand shifts in a historical manuscript through fuzzy soft clustering in combination with linear principal component analysis.
arXiv Detail & Related papers (2022-10-30T09:07:51Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - One-shot Compositional Data Generation for Low Resource Handwritten Text
Recognition [10.473427493876422]
Low resource Handwritten Text Recognition is a hard problem due to the scarce annotated data and the very limited linguistic information.
In this paper we address this problem through a data generation technique based on Bayesian Program Learning.
Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet.
arXiv Detail & Related papers (2021-05-11T18:53:01Z) - MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition [36.12001394921506]
We propose a new approach to handwritten text recognition.
We use a novel meta-learning framework which exploits additional new-writer data.
Our framework can be easily implemented on the top of most state-of-the-art HTR models.
arXiv Detail & Related papers (2021-04-05T12:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.