Benchmarking Chinese Text Recognition: Datasets, Baselines, and an
Empirical Study
- URL: http://arxiv.org/abs/2112.15093v1
- Date: Thu, 30 Dec 2021 15:30:52 GMT
- Title: Benchmarking Chinese Text Recognition: Datasets, Baselines, and an
Empirical Study
- Authors: Jingye Chen, Haiyang Yu, Jianqi Ma, Mengnan Guan, Xixi Xu, Xiaocong
Wang, Shaobo Qu, Bin Li, Xiangyang Xue
- Abstract summary: Existing text recognition methods are mainly for English texts, whereas ignoring the pivotal role of Chinese texts.
We manually collect Chinese text datasets from publicly available competitions, projects, and papers, then divide them into four categories including scene, web, document, and handwriting datasets.
By analyzing the experimental results, we surprisingly observe that state-of-the-art baselines for recognizing English texts cannot perform well on Chinese scenarios.
- Score: 25.609450020149637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The flourishing blossom of deep learning has witnessed the rapid development
of text recognition in recent years. However, the existing text recognition
methods are mainly for English texts, whereas ignoring the pivotal role of
Chinese texts. As another widely-spoken language, Chinese text recognition in
all ways has extensive application markets. Based on our observations, we
attribute the scarce attention on Chinese text recognition to the lack of
reasonable dataset construction standards, unified evaluation methods, and
results of the existing baselines. To fill this gap, we manually collect
Chinese text datasets from publicly available competitions, projects, and
papers, then divide them into four categories including scene, web, document,
and handwriting datasets. Furthermore, we evaluate a series of representative
text recognition methods on these datasets with unified evaluation methods to
provide experimental results. By analyzing the experimental results, we
surprisingly observe that state-of-the-art baselines for recognizing English
texts cannot perform well on Chinese scenarios. We consider that there still
remain numerous challenges under exploration due to the characteristics of
Chinese texts, which are quite different from English texts. The code and
datasets are made publicly available at
https://github.com/FudanVI/benchmarking-chinese-text-recognition.
Related papers
- Multi-language Video Subtitle Dataset for Image-based Text Recognition [0.0]
This dataset includes 4,224 subtitle images extracted from 24 videos sourced from online platforms.
It features a wide variety of characters, including Thai consonants, vowels, tone marks, punctuation marks, numerals, Roman characters, and Arabic numerals.
arXiv Detail & Related papers (2024-11-07T00:06:53Z) - MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts [0.6053347262128919]
MultiSocial dataset contains 472,097 texts, of which about 58k are human-written.
We use this benchmark to compare existing detection methods in zero-shot as well as fine-tuned form.
Our results indicate that the fine-tuned detectors have no problem to be trained on social-media texts.
arXiv Detail & Related papers (2024-06-18T12:26:09Z) - Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering [50.52792174648067]
This initiative seeks to bridge the gap between textual and visual comprehension.
We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images.
We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
arXiv Detail & Related papers (2024-05-21T06:48:26Z) - The First Swahili Language Scene Text Detection and Recognition Dataset [55.83178123785643]
There is a significant gap in low-resource languages, especially the Swahili Language.
Swahili is widely spoken in East African countries but is still an under-explored language in scene text recognition.
We propose a comprehensive dataset of Swahili scene text images and evaluate the dataset on different scene text detection and recognition models.
arXiv Detail & Related papers (2024-05-19T03:55:02Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through
Image-IDS Aligning [61.34060587461462]
We propose a two-stage framework for Chinese Text Recognition (CTR)
We pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS)
This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character.
The learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition.
arXiv Detail & Related papers (2023-09-03T05:33:16Z) - Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images.
Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z) - PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten
Chinese Text Recognition [44.70246958636773]
We propose PageNet for end-to-end weakly supervised page-level HCTR.
PageNet detects and recognizes characters and predicts the reading order between them.
It can still output detection and recognition results at both the character and line levels.
arXiv Detail & Related papers (2022-07-29T17:47:45Z) - Reading and Writing: Discriminative and Generative Modeling for
Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images.
Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets.
Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z) - Robust End-to-End Offline Chinese Handwriting Text Page Spotter with
Text Kernel [4.028854207195064]
We propose a robust end-to-end Chinese text page spotter framework.
It unifies text detection and text recognition with text kernel.
Our method achieves state-of-the-art results on the CASIA-HWDB2.0-2.2 dataset and ICDAR-2013 competition dataset.
arXiv Detail & Related papers (2021-07-04T05:42:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.