Related papers: Content and Style Aware Generation of Text-line Images for Handwriting Recognition

Content and Style Aware Generation of Text-line Images for Handwriting Recognition

URL: http://arxiv.org/abs/2204.05539v1
Date: Tue, 12 Apr 2022 05:52:03 GMT
Title: Content and Style Aware Generation of Text-line Images for Handwriting Recognition
Authors: Lei Kang, Pau Riba, Mar\c{c}al Rusi\~nol, Alicia Forn\'es, Mauricio Villegas
Abstract summary: We propose a generative method for handwritten text-line images conditioned on both visual appearance and textual content. Our method is able to produce long text-line samples with diverse handwriting styles.
Score: 4.301658883577544
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Handwritten Text Recognition has achieved an impressive performance in public benchmarks. However, due to the high inter- and intra-class variability between handwriting styles, such recognizers need to be trained using huge volumes of manually labeled training data. To alleviate this labor-consuming problem, synthetic data produced with TrueType fonts has been often used in the training loop to gain volume and augment the handwriting style variability. However, there is a significant style bias between synthetic and real data which hinders the improvement of recognition performance. To deal with such limitations, we propose a generative method for handwritten text-line images, which is conditioned on both visual appearance and textual content. Our method is able to produce long text-line samples with diverse handwriting styles. Once properly trained, our method can also be adapted to new target data by only accessing unlabeled text-line images to mimic handwritten styles and produce images with any textual content. Extensive experiments have been done on making use of the generated samples to boost Handwritten Text Recognition performance. Both qualitative and quantitative results demonstrate that the proposed approach outperforms the current state of the art.

Related papers

FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL [78.59912944698992]
We propose FocusDiff to enhance fine-grained text-image semantic alignment.<n>We construct a new dataset of paired texts and images with similar overall expressions but distinct local semantics.<n>Our approach achieves state-of-the-art performance on existing text-to-image benchmarks and significantly outperforms prior methods on PairComp.
arXiv Detail & Related papers (2025-06-05T18:36:33Z)
DiffusionPen: Towards Controlling the Style of Handwritten Text Generation [7.398476020996681]
DiffusionPen (DiffPen) is a 5-shot style handwritten text generation approach based on Latent Diffusion Models. Our approach captures both textual and stylistic characteristics of seen and unseen words and styles, generating realistic handwritten samples. Our method outperforms existing methods qualitatively and quantitatively, and its additional generated data can improve the performance of Handwriting Text Recognition (HTR) systems.
arXiv Detail & Related papers (2024-09-09T20:58:25Z)
Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models. We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning. Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z)
Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering [118.53208190209517]
We propose a framework to learn the proper textual descriptions for diffusion models through prompt learning. Our method can effectively learn the prompts to improve the matches between the input text and the generated images.
arXiv Detail & Related papers (2024-01-12T03:46:29Z)
WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models [8.334487584550185]
We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level. Our proposed method is able to generate realistic word image samples from different writer styles. We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
arXiv Detail & Related papers (2023-03-29T10:19:26Z)
Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions [52.250269529057014]
Handwritten Text Recognition (HTR) in free-volution pages is a challenging image understanding task. We propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text.
arXiv Detail & Related papers (2022-08-17T06:55:54Z)
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images. Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets. Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z)
SLOGAN: Handwriting Style Synthesis for Arbitrary-Length and Out-of-Vocabulary Text [35.83345711291558]
We propose a novel method that can synthesize parameterized and controllable handwriting Styles for arbitrary-Length and Out-of-vocabulary text. We embed the text content by providing an easily obtainable printed style image, so that the diversity of the content can be flexibly achieved. Our method can synthesize words that are not included in the training vocabulary and with various new styles.
arXiv Detail & Related papers (2022-02-23T12:13:27Z)
SmartPatch: Improving Handwritten Word Imitation with Patch Discriminators [67.54204685189255]
We propose SmartPatch, a new technique increasing the performance of current state-of-the-art methods. We combine the well-known patch loss with information gathered from the parallel trained handwritten text recognition system. This leads to a more enhanced local discriminator and results in more realistic and higher-quality generated handwritten words.
arXiv Detail & Related papers (2021-05-21T18:34:21Z)
One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition [10.473427493876422]
Low resource Handwritten Text Recognition is a hard problem due to the scarce annotated data and the very limited linguistic information. In this paper we address this problem through a data generation technique based on Bayesian Program Learning. Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet.
arXiv Detail & Related papers (2021-05-11T18:53:01Z)
Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild [103.51604161298512]
We propose an adversarial learning framework for the generation and recognition of multiple characters in an image. Our framework can be integrated into recent recognition methods to achieve new state-of-the-art recognition accuracy.
arXiv Detail & Related papers (2020-01-13T12:41:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.