Zero-shot Generation of Training Data with Denoising Diffusion
Probabilistic Model for Handwritten Chinese Character Recognition
- URL: http://arxiv.org/abs/2305.15660v1
- Date: Thu, 25 May 2023 02:13:37 GMT
- Title: Zero-shot Generation of Training Data with Denoising Diffusion
Probabilistic Model for Handwritten Chinese Character Recognition
- Authors: Dongnan Gui, Kai Chen, Haisong Ding and Qiang Huo
- Abstract summary: There are more than 80,000 character categories in Chinese but most are rarely used.
To build a high performance handwritten Chinese character recognition system, many training samples need be collected for each character category.
We propose a novel approach to transforming Chinese character glyph images generated from font libraries to handwritten ones.
- Score: 11.186226578337125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There are more than 80,000 character categories in Chinese while most of them
are rarely used. To build a high performance handwritten Chinese character
recognition (HCCR) system supporting the full character set with a traditional
approach, many training samples need be collected for each character category,
which is both time-consuming and expensive. In this paper, we propose a novel
approach to transforming Chinese character glyph images generated from font
libraries to handwritten ones with a denoising diffusion probabilistic model
(DDPM). Training from handwritten samples of a small character set, the DDPM is
capable of mapping printed strokes to handwritten ones, which makes it possible
to generate photo-realistic and diverse style handwritten samples of unseen
character categories. Combining DDPM-synthesized samples of unseen categories
with real samples of other categories, we can build an HCCR system to support
the full character set. Experimental results on CASIA-HWDB dataset with 3,755
character categories show that the HCCR systems trained with synthetic samples
perform similarly with the one trained with real samples in terms of
recognition accuracy. The proposed method has the potential to address HCCR
with a larger vocabulary.
Related papers
- Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing [71.29488677105127]
Existing scene text recognition (STR) methods struggle to recognize challenging texts, especially for artistic and severely distorted characters.
We propose a contrastive learning-based STR framework by leveraging synthetic and real unlabeled data without any human cost.
Our method achieves SOTA performance (94.7% and 70.9% average accuracy on common benchmarks and Union14M-Benchmark.
arXiv Detail & Related papers (2024-11-23T15:24:47Z) - MetaScript: Few-Shot Handwritten Chinese Content Generation via
Generative Adversarial Networks [15.037121719502606]
We propose MetaScript, a novel content generation system designed to address the diminishing presence of personal handwriting styles in the digital representation of Chinese characters.
Our approach harnesses the power of few-shot learning to generate Chinese characters that retain the individual's unique handwriting style and maintain the efficiency of digital typing.
arXiv Detail & Related papers (2023-12-25T17:31:19Z) - Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through
Image-IDS Aligning [61.34060587461462]
We propose a two-stage framework for Chinese Text Recognition (CTR)
We pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS)
This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character.
The learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition.
arXiv Detail & Related papers (2023-09-03T05:33:16Z) - Sampling and Ranking for Digital Ink Generation on a tight computational
budget [69.15275423815461]
We study ways to maximize the quality of the output of a trained digital ink generative model.
We use and compare the effect of multiple sampling and ranking techniques, in the first ablation study of its kind in the digital ink domain.
arXiv Detail & Related papers (2023-06-02T09:55:15Z) - Improving Handwritten OCR with Training Samples Generated by Glyph
Conditional Denoising Diffusion Probabilistic Model [10.239782333441031]
We propose a denoising diffusion probabilistic model (DDPM) to generate training samples.
This model creates mappings between printed characters and handwritten images.
Synthetic images are not always consistent with the glyph conditional images.
We propose a progressive data filtering strategy to add those samples with a high confidence of correctness to the training set.
arXiv Detail & Related papers (2023-05-31T04:18:30Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Scalable Font Reconstruction with Dual Latent Manifolds [55.29525824849242]
We propose a deep generative model that performs typography analysis and font reconstruction.
Our approach enables us to massively scale up the number of character types we can effectively model.
We evaluate on the task of font reconstruction over various datasets representing character types of many languages.
arXiv Detail & Related papers (2021-09-10T20:37:43Z) - ZiGAN: Fine-grained Chinese Calligraphy Font Generation via a Few-shot
Style Transfer Approach [7.318027179922774]
ZiGAN is a powerful end-to-end Chinese calligraphy font generation framework.
It does not require any manual operation or redundant preprocessing to generate fine-grained target-style characters.
Our method has a state-of-the-art generalization ability in few-shot Chinese character style transfer.
arXiv Detail & Related papers (2021-08-08T09:50:20Z) - Offline Handwritten Chinese Text Recognition with Convolutional Neural
Networks [5.984124397831814]
In this paper, we build the models using only the convolutional neural networks and use CTC as the loss function.
We achieve 6.81% character error rate (CER) on the ICDAR 2013 competition set, which is the best published result without language model correction.
arXiv Detail & Related papers (2020-06-28T14:34:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.