Related papers: MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning

MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning

URL: http://arxiv.org/abs/2505.20513v1
Date: Mon, 26 May 2025 20:26:16 GMT
Title: MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning
Authors: Wenhao Gu, Li Gu, Ching Yee Suen, Yang Wang,
Abstract summary: Traditional handwritten text recognition methods lack writer-specific personalization at test time.<n>We propose an efficient framework that formulates personalization as prompt tuning.<n>We validate our approach on the RIMES and IAM Handwriting Database benchmarks.
Score: 6.274266343486906
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in handwritten text recognition (HTR) have enabled the effective conversion of handwritten text to digital formats. However, achieving robust recognition across diverse writing styles remains challenging. Traditional HTR methods lack writer-specific personalization at test time due to limitations in model architecture and training strategies. Existing attempts to bridge this gap, through gradient-based meta-learning, still require labeled examples and suffer from parameter-inefficient fine-tuning, leading to substantial computational and memory overhead. To overcome these challenges, we propose an efficient framework that formulates personalization as prompt tuning, incorporating an auxiliary image reconstruction task with a self-supervised loss to guide prompt adaptation with unlabeled test-time examples. To ensure self-supervised loss effectively minimizes text recognition error, we leverage meta-learning to learn the optimal initialization of the prompts. As a result, our method allows the model to efficiently capture unique writing styles by updating less than 1% of its parameters and eliminating the need for time-intensive annotation processes. We validate our approach on the RIMES and IAM Handwriting Database benchmarks, where it consistently outperforms previous state-of-the-art methods while using 20x fewer parameters. We believe this represents a significant advancement in personalized handwritten text recognition, paving the way for more reliable and practical deployment in resource-constrained scenarios.

Related papers

Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition [12.228611784356412]
Handwritten Text Recognition (HTR) is essential for document analysis and digitization.<n>Legislation like the right to be forgotten'' underscores the necessity for methods that can expunge sensitive information from trained models.<n>We introduce a novel two-stage unlearning strategy for a multi-head transformer-based HTR model, integrating pruning and random labeling.
arXiv Detail & Related papers (2025-04-11T15:21:12Z)
Personalized Text Generation with Contrastive Activation Steering [63.60368120937822]
We propose a training-free framework that disentangles and represents personalized writing style as a vector.<n>Our framework achieves a significant 8% relative improvement in personalized generation while reducing storage requirements by 1700 times over PEFT method.
arXiv Detail & Related papers (2025-03-07T08:07:15Z)
DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning [7.036629164442979]
We introduce the DocTTT framework to address these challenges.<n>Key innovation of our approach is that it uses test-time training to adapt the model to each specific input during testing.<n>We propose a novel Meta-Auxiliary learning approach that combines Meta-learning and self-supervised Masked Autoencoder(MAE)
arXiv Detail & Related papers (2025-01-22T14:18:47Z)
UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model. We show that UNIT significantly outperforms existing methods on document-related tasks. Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z)
Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models. We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning. Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z)
Offline Detection of Misspelled Handwritten Words by Convolving Recognition Model Features with Text Labels [0.0]
We introduce the task of comparing a handwriting image to text. Our model's classification head is trained entirely on synthetic data created using a state-of-the-art generative adversarial network. Such massive performance gains can lead to significant productivity increases in applications utilizing human-in-the-loop automation.
arXiv Detail & Related papers (2023-09-18T21:13:42Z)
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models [137.74524357614285]
We introduce a novel Gradient-RegulAted Meta-prompt learning framework. It helps pre-training models adapt to downstream tasks in a parameter -- and data -- efficient way. GRAM can be easily incorporated into various prompt tuning methods in a model-agnostic way.
arXiv Detail & Related papers (2023-03-12T05:03:37Z)
Text-DIAE: Degradation Invariant Autoencoders for Text Recognition and Document Enhancement [8.428866479825736]
Text-DIAE aims to solve two tasks, text recognition (handwritten or scene-text) and document image enhancement. We define three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Our method surpasses the state-of-the-art significantly in existing supervised and self-supervised settings.
arXiv Detail & Related papers (2022-03-09T15:44:36Z)
Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents. Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages. We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z)
SmartPatch: Improving Handwritten Word Imitation with Patch Discriminators [67.54204685189255]
We propose SmartPatch, a new technique increasing the performance of current state-of-the-art methods. We combine the well-known patch loss with information gathered from the parallel trained handwritten text recognition system. This leads to a more enhanced local discriminator and results in more realistic and higher-quality generated handwritten words.
arXiv Detail & Related papers (2021-05-21T18:34:21Z)
Robust Document Representations using Latent Topics and Metadata [17.306088038339336]
We propose a novel approach to fine-tuning a pre-trained neural language model for document classification problems. We generate document representations that capture both text and metadata artifacts in a task manner. Our solution also incorporates metadata explicitly rather than just augmenting them with text.
arXiv Detail & Related papers (2020-10-23T21:52:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.