Stroke-Based Autoencoders: Self-Supervised Learners for Efficient
Zero-Shot Chinese Character Recognition
- URL: http://arxiv.org/abs/2207.08191v1
- Date: Sun, 17 Jul 2022 14:39:10 GMT
- Title: Stroke-Based Autoencoders: Self-Supervised Learners for Efficient
Zero-Shot Chinese Character Recognition
- Authors: Zongze Chen and Wenxia Yang and Xin Li
- Abstract summary: We develop a stroke-based autoencoder to model the sophisticated morphology of Chinese characters.
Our SAE architecture outperforms other existing methods in zero-shot recognition.
- Score: 4.64065792373245
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chinese characters carry a wealth of morphological and semantic information;
therefore, the semantic enhancement of the morphology of Chinese characters has
drawn significant attention. The previous methods were intended to directly
extract information from a whole Chinese character image, which usually cannot
capture both global and local information simultaneously. In this paper, we
develop a stroke-based autoencoder(SAE), to model the sophisticated morphology
of Chinese characters with the self-supervised method. Following its canonical
writing order, we first represent a Chinese character as a series of stroke
images with a fixed writing order, and then our SAE model is trained to
reconstruct this stroke image sequence. This pre-trained SAE model can predict
the stroke image series for unseen characters, as long as their strokes or
radicals appeared in the training set. We have designed two contrasting SAE
architectures on different forms of stroke images. One is fine-tuned on
existing stroke-based method for zero-shot recognition of handwritten Chinese
characters, and the other is applied to enrich the Chinese word embeddings from
their morphological features. The experimental results validate that after
pre-training, our SAE architecture outperforms other existing methods in
zero-shot recognition and enhances the representation of Chinese characters
with their abundant morphological and semantic information.
Related papers
- Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training [68.41837295318152]
Diffusion-based text-to-image models have demonstrated impressive achievements in diversity and aesthetics but struggle to generate images with visual texts.
Existing backbone models have limitations such as misspelling, failing to generate texts, and lack of support for Chinese text.
We propose a series of methods, aiming to empower backbone models to generate visual texts in English and Chinese.
arXiv Detail & Related papers (2024-10-06T10:25:39Z) - Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation [82.5217996570387]
We adapt a pre-trained language model for auto-regressive text-to-image generation.
We find that pre-trained language models offer limited help.
arXiv Detail & Related papers (2023-11-27T07:19:26Z) - Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through
Image-IDS Aligning [61.34060587461462]
We propose a two-stage framework for Chinese Text Recognition (CTR)
We pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS)
This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character.
The learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition.
arXiv Detail & Related papers (2023-09-03T05:33:16Z) - Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images.
Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z) - Stroke Extraction of Chinese Character Based on Deep Structure
Deformable Image Registration [25.49394055539858]
We propose a deep learning-based character stroke extraction method that takes semantic features and prior information of strokes into consideration.
This method consists of three parts: image registration-based stroke registration that establishes the rough registration of the reference strokes and the target as prior information; image semantic segmentation-based stroke segmentation that preliminarily separates target strokes into seven categories; and high-precision extraction of single strokes.
In the stroke registration, we propose a structure deformable image registration network to achieve structure-deformable transformation while maintaining the stable morphology of single strokes for character images with complex structures.
arXiv Detail & Related papers (2023-07-10T04:50:17Z) - Single-Stream Multi-Level Alignment for Vision-Language Pretraining [103.09776737512078]
We propose a single stream model that aligns the modalities at multiple levels.
We achieve this using two novel tasks: symmetric cross-modality reconstruction and a pseudo-labeled key word prediction.
We demonstrate top performance on a set of Vision-Language downstream tasks such as zero-shot/fine-tuned image/text retrieval, referring expression, and VQA.
arXiv Detail & Related papers (2022-03-27T21:16:10Z) - GlyphCRM: Bidirectional Encoder Representation for Chinese Character
with its Glyph [31.723483415041347]
Previous works indicate that the glyph of Chinese characters contains rich semantic information.
We propose a Chinese pre-trained representation model named as Glyph CRM.
It abandons the ID-based character embedding method yet solely based on sequential character images.
arXiv Detail & Related papers (2021-07-01T12:14:05Z) - ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin
Information [32.70080326854314]
We propose ChineseBERT, which incorporates the glyph and pinyin information of Chinese characters into language model pretraining.
The proposed ChineseBERT model yields significant performance boost over baseline models with fewer training steps.
arXiv Detail & Related papers (2021-06-30T13:06:00Z) - SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language
Model Pretraining [48.880840711568425]
We study the influences of three main factors on the Chinese tokenization for pretrained language models.
We propose three kinds of tokenizers: SHUOWEN (meaning Talk Word), the pronunciation-based tokenizers; 2) JIEZI (meaning Solve Character), the glyph-based tokenizers.
We find that SHUOWEN and JIEZI tokenizers can generally outperform conventional single-character tokenizers.
arXiv Detail & Related papers (2021-06-01T11:20:02Z) - CalliGAN: Style and Structure-aware Chinese Calligraphy Character
Generator [6.440233787863018]
Chinese calligraphy is the writing of Chinese characters as an art form performed with brushes.
Recent studies show that Chinese characters can be generated through image-to-image translation for multiple styles using a single model.
We propose a novel method of this approach by incorporating Chinese characters' component information into its model.
arXiv Detail & Related papers (2020-05-26T03:15:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.