Zero-Shot Chinese Character Recognition with Stroke-Level Decomposition
- URL: http://arxiv.org/abs/2106.11613v1
- Date: Tue, 22 Jun 2021 08:49:03 GMT
- Title: Zero-Shot Chinese Character Recognition with Stroke-Level Decomposition
- Authors: Jingye Chen, Bin Li, Xiangyang Xue
- Abstract summary: We propose a stroke-based method by decomposing each character into a sequence of strokes.
We employ a matching-based strategy to transform the predicted stroke sequence to a specific character.
The proposed method can be easily generalized to other languages whose characters can be decomposed into strokes.
- Score: 37.808021793372504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chinese character recognition has attracted much research interest due to its
wide applications. Although it has been studied for many years, some issues in
this field have not been completely resolved yet, e.g. the zero-shot problem.
Previous character-based and radical-based methods have not fundamentally
addressed the zero-shot problem since some characters or radicals in test sets
may not appear in training sets under a data-hungry condition. Inspired by the
fact that humans can generalize to know how to write characters unseen before
if they have learned stroke orders of some characters, we propose a
stroke-based method by decomposing each character into a sequence of strokes,
which are the most basic units of Chinese characters. However, we observe that
there is a one-to-many relationship between stroke sequences and Chinese
characters. To tackle this challenge, we employ a matching-based strategy to
transform the predicted stroke sequence to a specific character. We evaluate
the proposed method on handwritten characters, printed artistic characters, and
scene characters. The experimental results validate that the proposed method
outperforms existing methods on both character zero-shot and radical zero-shot
tasks. Moreover, the proposed method can be easily generalized to other
languages whose characters can be decomposed into strokes.
Related papers
- HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition [47.86479271322264]
We propose HierCode, a novel and lightweight codebook that exploits the innate hierarchical nature of Chinese characters.
HierCode employs a multi-hot encoding strategy, leveraging hierarchical binary tree encoding and prototype learning to create distinctive, informative representations for each character.
This approach not only facilitates zero-shot recognition of OOV characters by utilizing shared radicals and structures but also excels in line-level recognition tasks by computing similarity with visual features.
arXiv Detail & Related papers (2024-03-20T17:20:48Z) - Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through
Image-IDS Aligning [61.34060587461462]
We propose a two-stage framework for Chinese Text Recognition (CTR)
We pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS)
This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character.
The learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition.
arXiv Detail & Related papers (2023-09-03T05:33:16Z) - Chinese Character Recognition with Radical-Structured Stroke Trees [51.8541677234175]
We represent each Chinese character as a stroke tree, which is organized according to its radical structures.
We propose a two-stage decomposition framework, where a Feature-to-Radical Decoder perceives radical structures and radical regions.
A Radical-to-Stroke Decoder further predicts the stroke sequences according to the features of radical regions.
arXiv Detail & Related papers (2022-11-24T10:28:55Z) - STAR: Zero-Shot Chinese Character Recognition with Stroke- and
Radical-Level Decompositions [14.770409889132539]
We propose an effective zero-shot Chinese character recognition method by combining stroke- and radical-level decompositions.
Numerical results show that the proposed method outperforms the state-of-the-art methods in both character and radical zero-shot settings.
arXiv Detail & Related papers (2022-10-16T08:57:46Z) - Stroke-Based Autoencoders: Self-Supervised Learners for Efficient
Zero-Shot Chinese Character Recognition [4.64065792373245]
We develop a stroke-based autoencoder to model the sophisticated morphology of Chinese characters.
Our SAE architecture outperforms other existing methods in zero-shot recognition.
arXiv Detail & Related papers (2022-07-17T14:39:10Z) - ZiGAN: Fine-grained Chinese Calligraphy Font Generation via a Few-shot
Style Transfer Approach [7.318027179922774]
ZiGAN is a powerful end-to-end Chinese calligraphy font generation framework.
It does not require any manual operation or redundant preprocessing to generate fine-grained target-style characters.
Our method has a state-of-the-art generalization ability in few-shot Chinese character style transfer.
arXiv Detail & Related papers (2021-08-08T09:50:20Z) - SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language
Model Pretraining [48.880840711568425]
We study the influences of three main factors on the Chinese tokenization for pretrained language models.
We propose three kinds of tokenizers: SHUOWEN (meaning Talk Word), the pronunciation-based tokenizers; 2) JIEZI (meaning Solve Character), the glyph-based tokenizers.
We find that SHUOWEN and JIEZI tokenizers can generally outperform conventional single-character tokenizers.
arXiv Detail & Related papers (2021-06-01T11:20:02Z) - Hippocampus-heuristic Character Recognition Network for Zero-shot
Learning [3.720802292070508]
This paper proposes a novel Hippocampus-heuristic Character Recognition Network (HCRN)
HCRN can recognize unseen Chinese characters (namely zero-shot learning) only by training part of radicals.
It can accurately predict about 16,330 unseen testing Chinese characters relied on only 500 trained Chinese characters.
arXiv Detail & Related papers (2021-04-06T01:57:20Z) - Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces [60.58900627906269]
We propose a pre-train language model as the substitutes generator using sentence-pieces to craft adversarial examples in Chinese.
The substitutions in the generated adversarial examples are not characters or words but textit'pieces', which are more natural to Chinese readers.
arXiv Detail & Related papers (2020-12-29T14:28:07Z) - Neural Computing for Online Arabic Handwriting Character Recognition
using Hard Stroke Features Mining [0.0]
An enhanced method of detecting the desired critical points from vertical and horizontal direction-length of handwriting stroke features of online Arabic script recognition is proposed.
A minimum feature set is extracted from these tokens for classification of characters using a multilayer perceptron with a back-propagation learning algorithm and modified sigmoid function-based activation function.
The proposed method achieves an average accuracy of 98.6% comparable in state of art character recognition techniques.
arXiv Detail & Related papers (2020-05-02T23:17:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.