Toward Zero-shot Character Recognition: A Gold Standard Dataset with
Radical-level Annotations
- URL: http://arxiv.org/abs/2308.00655v1
- Date: Tue, 1 Aug 2023 16:41:30 GMT
- Title: Toward Zero-shot Character Recognition: A Gold Standard Dataset with
Radical-level Annotations
- Authors: Xiaolei Diao, Daqian Shi, Jian Li, Lida Shi, Mingzhe Yue, Ruihua Qi,
Chuntao Li, Hao Xu
- Abstract summary: In this paper, we construct an ancient Chinese character image dataset that contains both radical-level and character-level annotations.
To increase the adaptability of ACCID, we propose a splicing-based synthetic character algorithm to augment the training samples and apply an image denoising method to improve the image quality.
- Score: 5.761679637905164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical character recognition (OCR) methods have been applied to diverse
tasks, e.g., street view text recognition and document analysis. Recently,
zero-shot OCR has piqued the interest of the research community because it
considers a practical OCR scenario with unbalanced data distribution. However,
there is a lack of benchmarks for evaluating such zero-shot methods that apply
a divide-and-conquer recognition strategy by decomposing characters into
radicals. Meanwhile, radical recognition, as another important OCR task, also
lacks radical-level annotation for model training. In this paper, we construct
an ancient Chinese character image dataset that contains both radical-level and
character-level annotations to satisfy the requirements of the above-mentioned
methods, namely, ACCID, where radical-level annotations include radical
categories, radical locations, and structural relations. To increase the
adaptability of ACCID, we propose a splicing-based synthetic character
algorithm to augment the training samples and apply an image denoising method
to improve the image quality. By introducing character decomposition and
recombination, we propose a baseline method for zero-shot OCR. The experimental
results demonstrate the validity of ACCID and the baseline model quantitatively
and qualitatively.
Related papers
- DLoRA-TrOCR: Mixed Text Mode Optical Character Recognition Based On Transformer [12.966765239586994]
Multi- fonts, mixed scenes and complex layouts seriously affect the recognition accuracy of traditional OCR models.
We propose a parameter-efficient mixed text recognition method based on pre-trained OCR Transformer, namely DLoRA-TrOCR.
arXiv Detail & Related papers (2024-04-19T09:28:16Z) - Graph-level Protein Representation Learning by Structure Knowledge
Refinement [50.775264276189695]
This paper focuses on learning representation on the whole graph level in an unsupervised manner.
We propose a novel framework called Structure Knowledge Refinement (SKR) which uses data structure to determine the probability of whether a pair is positive or negative.
arXiv Detail & Related papers (2024-01-05T09:05:33Z) - Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based
Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR)
It aims to use sketches from unseen categories as queries to match the images of the same category.
We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z) - Chinese Character Recognition with Radical-Structured Stroke Trees [51.8541677234175]
We represent each Chinese character as a stroke tree, which is organized according to its radical structures.
We propose a two-stage decomposition framework, where a Feature-to-Radical Decoder perceives radical structures and radical regions.
A Radical-to-Stroke Decoder further predicts the stroke sequences according to the features of radical regions.
arXiv Detail & Related papers (2022-11-24T10:28:55Z) - STAR: Zero-Shot Chinese Character Recognition with Stroke- and
Radical-Level Decompositions [14.770409889132539]
We propose an effective zero-shot Chinese character recognition method by combining stroke- and radical-level decompositions.
Numerical results show that the proposed method outperforms the state-of-the-art methods in both character and radical zero-shot settings.
arXiv Detail & Related papers (2022-10-16T08:57:46Z) - RZCR: Zero-shot Character Recognition via Radical-based Reasoning [17.305603529254608]
RZCR consists of a visual semantic fusion-based radical information extractor (RIE) and a knowledge graph character reasoner (KGR)
RZCR shows promising experimental results, especially on few-sample character datasets.
arXiv Detail & Related papers (2022-07-12T21:12:05Z) - BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid
Counterfactual Training for Robust Content-based Image Retrieval [61.803481264081036]
Content-Based Image Retrieval (CIR) aims to search for a target image by concurrently comprehending the composition of an example image and a complementary text.
We tackle this task by a novel underlinetextbfBottom-up crunderlinetextbfOss-modal underlinetextbfSemantic compounderlinetextbfSition (textbfBOSS) with Hybrid Counterfactual Training framework.
arXiv Detail & Related papers (2022-07-09T07:14:44Z) - An Evaluation of OCR on Egocentric Data [30.637021477342035]
In this paper, we evaluate state-of-the-art OCR methods on Egocentric data.
We demonstrate that existing OCR methods struggle with rotated text, which is frequently observed on objects being handled.
We introduce a simple rotate-and-merge procedure which can be applied to pre-trained OCR models that halves the normalized edit distance error.
arXiv Detail & Related papers (2022-06-11T10:37:20Z) - Assessing a Single Image in Reference-Guided Image Synthesis [14.936460594115953]
We propose a learning-based framework, Reference-guided Image Synthesis Assessment (RISA) to quantitatively evaluate the quality of a single generated image.
As this annotation is too coarse as a supervision signal, we introduce two techniques: 1) a pixel-wise scheme to refine the coarse labels, and 2) multiple binary classifiers to replace a na"ive regressor.
RISA is highly consistent with human preference and transfers well across models.
arXiv Detail & Related papers (2021-12-08T08:22:14Z) - Neural Model Reprogramming with Similarity Based Mapping for
Low-Resource Spoken Command Recognition [71.96870151495536]
We propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR)
The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model.
We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech.
arXiv Detail & Related papers (2021-10-08T05:07:35Z) - Separating Content from Style Using Adversarial Learning for Recognizing
Text in the Wild [103.51604161298512]
We propose an adversarial learning framework for the generation and recognition of multiple characters in an image.
Our framework can be integrated into recent recognition methods to achieve new state-of-the-art recognition accuracy.
arXiv Detail & Related papers (2020-01-13T12:41:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.