HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition
- URL: http://arxiv.org/abs/2403.13761v1
- Date: Wed, 20 Mar 2024 17:20:48 GMT
- Title: HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition
- Authors: Yuyi Zhang, Yuanzhi Zhu, Dezhi Peng, Peirong Zhang, Zhenhua Yang, Zhibo Yang, Cong Yao, Lianwen Jin,
- Abstract summary: We propose HierCode, a novel and lightweight codebook that exploits the innate hierarchical nature of Chinese characters.
HierCode employs a multi-hot encoding strategy, leveraging hierarchical binary tree encoding and prototype learning to create distinctive, informative representations for each character.
This approach not only facilitates zero-shot recognition of OOV characters by utilizing shared radicals and structures but also excels in line-level recognition tasks by computing similarity with visual features.
- Score: 47.86479271322264
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, recognition of Out-Of-Vocabulary (OOV) characters, and on-device deployment due to their computational intensity. To address these challenges, we propose HierCode, a novel and lightweight codebook that exploits the innate hierarchical nature of Chinese characters. HierCode employs a multi-hot encoding strategy, leveraging hierarchical binary tree encoding and prototype learning to create distinctive, informative representations for each character. This approach not only facilitates zero-shot recognition of OOV characters by utilizing shared radicals and structures but also excels in line-level recognition tasks by computing similarity with visual features, a notable advantage over existing methods. Extensive experiments across diverse benchmarks, including handwritten, scene, document, web, and ancient text, have showcased HierCode's superiority for both conventional and zero-shot Chinese character or text recognition, exhibiting state-of-the-art performance with significantly fewer parameters and fast inference speed.
Related papers
- Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through
Image-IDS Aligning [61.34060587461462]
We propose a two-stage framework for Chinese Text Recognition (CTR)
We pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS)
This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character.
The learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition.
arXiv Detail & Related papers (2023-09-03T05:33:16Z) - DTrOCR: Decoder-only Transformer for Optical Character Recognition [0.0]
We propose a simpler and more effective method for text recognition, known as the Decoder-only Transformer for Optical Character Recognition (DTrOCR)
This method uses a decoder-only Transformer to take advantage of a generative language model that is pre-trained on a large corpus.
Our experiments demonstrated that DTrOCR outperforms current state-of-the-art methods by a large margin in the recognition of printed, handwritten, and scene text in both English and Chinese.
arXiv Detail & Related papers (2023-08-30T12:37:03Z) - Learning Generative Structure Prior for Blind Text Image
Super-resolution [153.05759524358467]
We present a novel prior that focuses more on the character structure.
To restrict the generative space of StyleGAN, we store the discrete features for each character in a codebook.
The proposed structure prior exerts stronger character-specific guidance to restore faithful and precise strokes of a designated character.
arXiv Detail & Related papers (2023-03-26T13:54:28Z) - Enhancing Indic Handwritten Text Recognition Using Global Semantic
Information [36.01828106385858]
We use a semantic module in an encoder-decoder framework for extracting global semantic information to recognize the Indic handwritten texts.
The proposed framework achieves state-of-the-art results on handwritten texts of ten Indic languages.
arXiv Detail & Related papers (2022-12-15T12:53:26Z) - MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining [68.05105411320842]
We propose a novel approach MaskOCR to unify vision and language pre-training in the classical encoder-decoder recognition framework.
We adopt the masked image modeling approach to pre-train the feature encoder using a large set of unlabeled real text images.
We transform text data into synthesized text images to unify the data modalities of vision and language, and enhance the language modeling capability of the sequence decoder.
arXiv Detail & Related papers (2022-06-01T08:27:19Z) - Dual Encoding for Video Retrieval by Text [49.34356217787656]
We propose a dual deep encoding network that encodes videos and queries into powerful dense representations of their own.
Our novelty is two-fold. First, different from prior art that resorts to a specific single-level encoder, the proposed network performs multi-level encoding.
Second, different from a conventional common space learning algorithm which is either concept based or latent space based, we introduce hybrid space learning.
arXiv Detail & Related papers (2020-09-10T15:49:39Z) - Neural Computing for Online Arabic Handwriting Character Recognition
using Hard Stroke Features Mining [0.0]
An enhanced method of detecting the desired critical points from vertical and horizontal direction-length of handwriting stroke features of online Arabic script recognition is proposed.
A minimum feature set is extracted from these tokens for classification of characters using a multilayer perceptron with a back-propagation learning algorithm and modified sigmoid function-based activation function.
The proposed method achieves an average accuracy of 98.6% comparable in state of art character recognition techniques.
arXiv Detail & Related papers (2020-05-02T23:17:08Z) - Separating Content from Style Using Adversarial Learning for Recognizing
Text in the Wild [103.51604161298512]
We propose an adversarial learning framework for the generation and recognition of multiple characters in an image.
Our framework can be integrated into recent recognition methods to achieve new state-of-the-art recognition accuracy.
arXiv Detail & Related papers (2020-01-13T12:41:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.