MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition
- URL: http://arxiv.org/abs/2104.01876v1
- Date: Mon, 5 Apr 2021 12:35:39 GMT
- Title: MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition
- Authors: Ayan Kumar Bhunia, Shuvozit Ghose, Amandeep Kumar, Pinaki Nath
Chowdhury, Aneeshan Sain, Yi-Zhe Song
- Abstract summary: We propose a new approach to handwritten text recognition.
We use a novel meta-learning framework which exploits additional new-writer data.
Our framework can be easily implemented on the top of most state-of-the-art HTR models.
- Score: 36.12001394921506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Handwritten Text Recognition (HTR) remains a challenging problem to date,
largely due to the varying writing styles that exist amongst us. Prior works
however generally operate with the assumption that there is a limited number of
styles, most of which have already been captured by existing datasets. In this
paper, we take a completely different perspective -- we work on the assumption
that there is always a new style that is drastically different, and that we
will only have very limited data during testing to perform adaptation. This
results in a commercially viable solution -- the model has the best shot at
adaptation being exposed to the new style, and the few samples nature makes it
practical to implement. We achieve this via a novel meta-learning framework
which exploits additional new-writer data through a support set, and outputs a
writer-adapted model via single gradient step update, all during inference. We
discover and leverage on the important insight that there exists few key
characters per writer that exhibit relatively larger style discrepancies. For
that, we additionally propose to meta-learn instance specific weights for a
character-wise cross-entropy loss, which is specifically designed to work with
the sequential nature of text data. Our writer-adaptive MetaHTR framework can
be easily implemented on the top of most state-of-the-art HTR models.
Experiments show an average performance gain of 5-7% can be obtained by
observing very few new style data. We further demonstrate via a set of ablative
studies the advantage of our meta design when compared with alternative
adaption mechanisms.
Related papers
- DiffusionPen: Towards Controlling the Style of Handwritten Text Generation [7.398476020996681]
DiffusionPen (DiffPen) is a 5-shot style handwritten text generation approach based on Latent Diffusion Models.
Our approach captures both textual and stylistic characteristics of seen and unseen words and styles, generating realistic handwritten samples.
Our method outperforms existing methods qualitatively and quantitatively, and its additional generated data can improve the performance of Handwriting Text Recognition (HTR) systems.
arXiv Detail & Related papers (2024-09-09T20:58:25Z) - Capturing Style in Author and Document Representation [4.323709559692927]
We propose a new architecture that learns embeddings for both authors and documents with a stylistic constraint.
We evaluate our method on three datasets: a literary corpus extracted from the Gutenberg Project, the Blog Authorship and IMDb62.
arXiv Detail & Related papers (2024-07-18T10:01:09Z) - How to Choose Pretrained Handwriting Recognition Models for Single
Writer Fine-Tuning [23.274139396706264]
Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on modern and historical manuscripts.
Those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting.
In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model.
We give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines
arXiv Detail & Related papers (2023-05-04T07:00:28Z) - Towards Writing Style Adaptation in Handwriting Recognition [0.0]
We explore models with writer-dependent parameters which take the writer's identity as an additional input.
We propose a Writer Style Block (WSB), an adaptive instance normalization layer conditioned on learned embeddings of the partitions.
We show that our approach outperforms a baseline with no WSB in a writer-dependent scenario and that it is possible to estimate embeddings for new writers.
arXiv Detail & Related papers (2023-02-13T12:36:17Z) - Unsupervised Neural Stylistic Text Generation using Transfer learning
and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation.
We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z) - Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues.
A main challenge is that a person often writes a letter in different styles from time to time.
We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - DeepStyle: User Style Embedding for Authorship Attribution of Short
Texts [57.503904346336384]
Authorship attribution (AA) is an important and widely studied research topic with many applications.
Recent works have shown that deep learning methods could achieve significant accuracy improvement for the AA task.
We propose DeepStyle, a novel embedding-based framework that learns the representations of users' salient writing styles.
arXiv Detail & Related papers (2021-03-14T15:56:37Z) - Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization.
We also propose a prototype encoding structure to model the ability of intra-topic derivation.
Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z) - ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation [0.9542023122304099]
We present ScrabbleGAN, a semi-supervised approach to synthesize handwritten text images.
ScrabbleGAN relies on a novel generative model which can generate images of words with an arbitrary length.
arXiv Detail & Related papers (2020-03-23T21:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.