Improving Handwritten OCR with Training Samples Generated by Glyph
Conditional Denoising Diffusion Probabilistic Model
- URL: http://arxiv.org/abs/2305.19543v1
- Date: Wed, 31 May 2023 04:18:30 GMT
- Title: Improving Handwritten OCR with Training Samples Generated by Glyph
Conditional Denoising Diffusion Probabilistic Model
- Authors: Haisong Ding, Bozhi Luan, Dongnan Gui, Kai Chen, Qiang Huo
- Abstract summary: We propose a denoising diffusion probabilistic model (DDPM) to generate training samples.
This model creates mappings between printed characters and handwritten images.
Synthetic images are not always consistent with the glyph conditional images.
We propose a progressive data filtering strategy to add those samples with a high confidence of correctness to the training set.
- Score: 10.239782333441031
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Constructing a highly accurate handwritten OCR system requires large amounts
of representative training data, which is both time-consuming and expensive to
collect. To mitigate the issue, we propose a denoising diffusion probabilistic
model (DDPM) to generate training samples. This model conditions on a printed
glyph image and creates mappings between printed characters and handwritten
images, thus enabling the generation of photo-realistic handwritten samples
with diverse styles and unseen text contents. However, the text contents in
synthetic images are not always consistent with the glyph conditional images,
leading to unreliable labels of synthetic samples. To address this issue, we
further propose a progressive data filtering strategy to add those samples with
a high confidence of correctness to the training set. Experimental results on
IAM benchmark task show that OCR model trained with augmented DDPM-synthesized
training samples can achieve about 45% relative word error rate reduction
compared with the one trained on real data only.
Related papers
- Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing [71.29488677105127]
Existing scene text recognition (STR) methods struggle to recognize challenging texts, especially for artistic and severely distorted characters.
We propose a contrastive learning-based STR framework by leveraging synthetic and real unlabeled data without any human cost.
Our method achieves SOTA performance (94.7% and 70.9% average accuracy on common benchmarks and Union14M-Benchmark.
arXiv Detail & Related papers (2024-11-23T15:24:47Z) - Debiasing Vison-Language Models with Text-Only Training [15.069736314663352]
We propose a Text-Only Debiasing framework called TOD, leveraging a text-as-image training paradigm to mitigate visual biases.
To address the limitations, we propose a Text-Only Debiasing framework called TOD, leveraging a text-as-image training paradigm to mitigate visual biases.
arXiv Detail & Related papers (2024-10-12T04:34:46Z) - CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion [58.64822817224639]
Diffusion models have a tendency to exactly replicate their training data, especially when trained on small datasets.
We present CPSample, a method that modifies the sampling process to prevent training data replication while preserving image quality.
CPSample achieves FID scores of 4.97 and 2.97 on CIFAR-10 and CelebA-64, respectively, without producing exact replicates of the training data.
arXiv Detail & Related papers (2024-09-11T05:42:01Z) - UDiffText: A Unified Framework for High-quality Text Synthesis in
Arbitrary Images via Character-aware Diffusion Models [25.219960711604728]
This paper proposes a novel approach for text image generation, utilizing a pre-trained diffusion model.
Our approach involves the design and training of a light-weight character-level text encoder, which replaces the original CLIP encoder.
By employing an inference stage refinement process, we achieve a notably high sequence accuracy when synthesizing text in arbitrarily given images.
arXiv Detail & Related papers (2023-12-08T07:47:46Z) - ALIP: Adaptive Language-Image Pre-training with Synthetic Caption [78.93535202851278]
Contrastive Language-Image Pre-training (CLIP) has significantly boosted the performance of various vision-language tasks.
The presence of intrinsic noise and unmatched image-text pairs in web data can potentially affect the performance of representation learning.
We propose an Adaptive Language-Image Pre-training (ALIP), a bi-path model that integrates supervision from both raw text and synthetic caption.
arXiv Detail & Related papers (2023-08-16T15:19:52Z) - Image Captions are Natural Prompts for Text-to-Image Models [70.30915140413383]
We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts.
We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data.
Our method significantly improves the performance of models trained on synthetic training data.
arXiv Detail & Related papers (2023-07-17T14:38:11Z) - Zero-shot Generation of Training Data with Denoising Diffusion
Probabilistic Model for Handwritten Chinese Character Recognition [11.186226578337125]
There are more than 80,000 character categories in Chinese but most are rarely used.
To build a high performance handwritten Chinese character recognition system, many training samples need be collected for each character category.
We propose a novel approach to transforming Chinese character glyph images generated from font libraries to handwritten ones.
arXiv Detail & Related papers (2023-05-25T02:13:37Z) - Text-Conditioned Sampling Framework for Text-to-Image Generation with
Masked Generative Models [52.29800567587504]
We propose a learnable sampling model, Text-Conditioned Token Selection (TCTS), to select optimal tokens via localized supervision with text information.
TCTS improves not only the image quality but also the semantic alignment of the generated images with the given texts.
We validate the efficacy of TCTS combined with Frequency Adaptive Sampling (FAS) with various generative tasks, demonstrating that it significantly outperforms the baselines in image-text alignment and image quality.
arXiv Detail & Related papers (2023-04-04T03:52:49Z) - Discriminative Class Tokens for Text-to-Image Diffusion Models [107.98436819341592]
We propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text.
Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images.
We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier.
arXiv Detail & Related papers (2023-03-30T05:25:20Z) - Self-Training of Handwritten Word Recognition for Synthetic-to-Real
Adaptation [4.111899441919165]
We propose a self-training approach to train a Handwritten Text Recognition model.
The proposed training scheme uses an initial model trained on synthetic data to make predictions for the unlabeled target dataset.
We evaluate the proposed method on four widely used benchmark datasets and show its effectiveness on closing the gap to a model trained in a fully-supervised manner.
arXiv Detail & Related papers (2022-06-07T09:43:25Z) - GLIDE: Towards Photorealistic Image Generation and Editing with
Text-Guided Diffusion Models [16.786221846896108]
We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies.
We find that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples.
Our models can be fine-tuned to perform image inpainting, enabling powerful text-driven image editing.
arXiv Detail & Related papers (2021-12-20T18:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.