The Challenges of HTR Model Training: Feedback from the Project Donner
le gout de l'archive a l'ere numerique
- URL: http://arxiv.org/abs/2212.11146v4
- Date: Sun, 12 Nov 2023 18:04:24 GMT
- Title: The Challenges of HTR Model Training: Feedback from the Project Donner
le gout de l'archive a l'ere numerique
- Authors: Beatrice Couture (Universit\'e de Montr\'eal), Farah Verret
(Universit\'e de Montr\'eal), Maxime Gohier (Universit\'e du Qu\'ebec \`a
Rimouski), Dominique Deslandres (Universit\'e de Montr\'eal)
- Abstract summary: This article reports on the impacts of creating transcribing protocols and using the language model at full scale.
It also determines the best way to use base models in order to help increase the performance of handwritten text recognition models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The arrival of handwriting recognition technologies offers new possibilities
for research in heritage studies. However, it is now necessary to reflect on
the experiences and the practices developed by research teams. Our use of the
Transkribus platform since 2018 has led us to search for the most significant
ways to improve the performance of our handwritten text recognition (HTR)
models which are made to transcribe French handwriting dating from the 17th
century. This article therefore reports on the impacts of creating transcribing
protocols, using the language model at full scale and determining the best way
to use base models in order to help increase the performance of HTR models.
Combining all of these elements can indeed increase the performance of a single
model by more than 20% (reaching a Character Error Rate below 5%). This article
also discusses some challenges regarding the collaborative nature of HTR
platforms such as Transkribus and the way researchers can share their data
generated in the process of creating or training handwritten text recognition
models.
Related papers
- Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models [0.0]
We apply TrOCR, a state-of-the-art transformer-based HTR model, to 16th-century Latin manuscripts authored by Rudolf Gwalther.<n>We introduce four novel augmentation methods designed specifically for historical handwriting characteristics.
arXiv Detail & Related papers (2025-08-15T14:20:58Z) - Quo Vadis Handwritten Text Generation for Handwritten Text Recognition? [34.1205194877339]
The digitization of historical manuscripts presents significant challenges for Handwritten Text Recognition (HTR) systems.<n>Handwritten Text Generation (HTG) techniques generate synthetic data tailored to specific handwriting styles.<n>We compare three state-of-the-art styled HTG models to assess their impact on HTR fine-tuning.
arXiv Detail & Related papers (2025-08-13T16:39:18Z) - Efficient fine-tuning methodology of text embedding models for information retrieval: contrastive learning penalty (clp) [0.0]
This study presents an efficient fine-tuning methodology to enhance the information retrieval performance of pre-trained text embedding models.
The proposed methodology achieves significant performance improvements over existing methods in document retrieval tasks.
arXiv Detail & Related papers (2024-12-23T07:55:22Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.
We introduce novel methodologies and datasets to overcome these challenges.
We propose MhBART, an encoder-decoder model designed to emulate human writing style.
We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach [35.50318959678818]
We propose DOLPHIN, a novel retrieval model designed to enhance handwriting representations through synergistic temporal-frequency analysis.
We introduce OLIWER, a large-scale online writer retrieval dataset encompassing over 670,000 Chinese handwritten phrases from 1,731 individuals.
Our findings emphasize the significance of point sampling frequency and pressure features in improving handwriting representation quality.
arXiv Detail & Related papers (2024-12-16T11:19:22Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning [69.23782518456932]
We propose a novel zero-shot video captioning framework named Retrieval-Enhanced Test-Time Adaptation (RETTA)
We bridge video and text using four key models: a general video-text retrieval model XCLIP, a general image-text matching model CLIP, a text alignment model AnglE, and a text generation model GPT-2.
To address this problem, we propose using learnable tokens as a communication medium among these four frozen models GPT-2, XCLIP, CLIP, and AnglE.
arXiv Detail & Related papers (2024-05-11T16:22:00Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Self-Supervised Representation Learning for Online Handwriting Text
Classification [0.8594140167290099]
We propose the novel Part of Stroke Masking (POSM) as a pretext task for pretraining models to extract informative representations from the online handwriting of individuals in English and Chinese languages.
To evaluate the quality of the extracted representations, we use both intrinsic and extrinsic evaluation methods.
The pretrained models are fine-tuned to achieve state-of-the-art results in tasks such as writer identification, gender classification, and handedness classification.
arXiv Detail & Related papers (2023-10-10T14:07:49Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - How to Choose Pretrained Handwriting Recognition Models for Single
Writer Fine-Tuning [23.274139396706264]
Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on modern and historical manuscripts.
Those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting.
In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model.
We give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines
arXiv Detail & Related papers (2023-05-04T07:00:28Z) - New Results for the Text Recognition of Arabic Maghrib{\=i} Manuscripts
-- Managing an Under-resourced Script [0.0]
We introduce and assess a new modus operandi for HTR models development and fine-tuning dedicated to the Arabic Maghrib=i scripts.
The comparison between several state-of-the-art HTR models demonstrates the relevance of a word-based neural approach specialized for Arabic.
Results open new perspectives for Arabic scripts processing and more generally for poorly-endowed languages processing.
arXiv Detail & Related papers (2022-11-29T12:21:41Z) - PART: Pre-trained Authorship Representation Transformer [52.623051272843426]
Authors writing documents imprint identifying information within their texts.<n>Previous works use hand-crafted features or classification tasks to train their authorship models.<n>We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Continuous Offline Handwriting Recognition using Deep Learning Models [0.0]
Handwritten text recognition is an open problem of great interest in the area of automatic document image analysis.
We have proposed a new recognition model based on integrating two types of deep learning architectures: convolutional neural networks (CNN) and sequence-to-sequence (seq2seq)
The new proposed model provides competitive results with those obtained with other well-established methodologies.
arXiv Detail & Related papers (2021-12-26T07:31:03Z) - Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA)
We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets.
The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z) - A Multi-Level Attention Model for Evidence-Based Fact Checking [58.95413968110558]
We present a simple model that can be trained on sequence structures.
Results on a large-scale dataset for Fact Extraction and VERification show that our model outperforms the graph-based approaches.
arXiv Detail & Related papers (2021-06-02T05:40:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.