Related papers: Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study

Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study

URL: http://arxiv.org/abs/2112.15093v1
Date: Thu, 30 Dec 2021 15:30:52 GMT
Title: Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study
Authors: Jingye Chen, Haiyang Yu, Jianqi Ma, Mengnan Guan, Xixi Xu, Xiaocong Wang, Shaobo Qu, Bin Li, Xiangyang Xue
Abstract summary: Existing text recognition methods are mainly for English texts, whereas ignoring the pivotal role of Chinese texts. We manually collect Chinese text datasets from publicly available competitions, projects, and papers, then divide them into four categories including scene, web, document, and handwriting datasets. By analyzing the experimental results, we surprisingly observe that state-of-the-art baselines for recognizing English texts cannot perform well on Chinese scenarios.
Score: 25.609450020149637
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The flourishing blossom of deep learning has witnessed the rapid development of text recognition in recent years. However, the existing text recognition methods are mainly for English texts, whereas ignoring the pivotal role of Chinese texts. As another widely-spoken language, Chinese text recognition in all ways has extensive application markets. Based on our observations, we attribute the scarce attention on Chinese text recognition to the lack of reasonable dataset construction standards, unified evaluation methods, and results of the existing baselines. To fill this gap, we manually collect Chinese text datasets from publicly available competitions, projects, and papers, then divide them into four categories including scene, web, document, and handwriting datasets. Furthermore, we evaluate a series of representative text recognition methods on these datasets with unified evaluation methods to provide experimental results. By analyzing the experimental results, we surprisingly observe that state-of-the-art baselines for recognizing English texts cannot perform well on Chinese scenarios. We consider that there still remain numerous challenges under exploration due to the characteristics of Chinese texts, which are quite different from English texts. The code and datasets are made publicly available at https://github.com/FudanVI/benchmarking-chinese-text-recognition.

Related papers

Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts [2.9100667158464035]
Chinese scene text retrieval is a practical task that aims to search for images containing visual instances of a Chinese query text.<n>Current efforts tend to inherit the solution for English scene text retrieval, failing to achieve satisfactory performance.<n>We propose Chinese Scene Text Retrieval CLIP (CSTR-CLIP), a novel model that integrates global visual information with multi-granularity alignment training.
arXiv Detail & Related papers (2025-06-05T13:10:17Z)
Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues [56.038123093599815]
Our objective is to translate continuous sign language into spoken language text. We incorporate additional contextual cues together with the signing video. We show that our contextual approach significantly enhances the quality of the translations.
arXiv Detail & Related papers (2025-01-16T18:59:03Z)
Research Experiment on Multi-Model Comparison for Chinese Text Classification Tasks [12.087640144194246]
This paper conducts a comparative study on three deep learning models:TextCNN, TextRNN, and FastText.specifically for Chinese text classification tasks. The performance of these models is evaluated, and their applicability in different scenarios is discussed.
arXiv Detail & Related papers (2024-12-25T13:54:40Z)
Multi-language Video Subtitle Dataset for Image-based Text Recognition [0.0]
This dataset includes 4,224 subtitle images extracted from 24 videos sourced from online platforms. It features a wide variety of characters, including Thai consonants, vowels, tone marks, punctuation marks, numerals, Roman characters, and Arabic numerals.
arXiv Detail & Related papers (2024-11-07T00:06:53Z)
MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts [0.6053347262128919]
MultiSocial dataset contains 472,097 texts, of which about 58k are human-written. We use this benchmark to compare existing detection methods in zero-shot as well as fine-tuned form. Our results indicate that the fine-tuned detectors have no problem to be trained on social-media texts.
arXiv Detail & Related papers (2024-06-18T12:26:09Z)
Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering [50.52792174648067]
This initiative seeks to bridge the gap between textual and visual comprehension. We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images. We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
arXiv Detail & Related papers (2024-05-21T06:48:26Z)
The First Swahili Language Scene Text Detection and Recognition Dataset [55.83178123785643]
There is a significant gap in low-resource languages, especially the Swahili Language. Swahili is widely spoken in East African countries but is still an under-explored language in scene text recognition. We propose a comprehensive dataset of Swahili scene text images and evaluate the dataset on different scene text detection and recognition models.
arXiv Detail & Related papers (2024-05-19T03:55:02Z)
Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models. We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning. Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z)
Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning [61.34060587461462]
We propose a two-stage framework for Chinese Text Recognition (CTR) We pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS) This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character. The learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition.
arXiv Detail & Related papers (2023-09-03T05:33:16Z)
Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images. Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z)
PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition [44.70246958636773]
We propose PageNet for end-to-end weakly supervised page-level HCTR. PageNet detects and recognizes characters and predicts the reading order between them. It can still output detection and recognition results at both the character and line levels.
arXiv Detail & Related papers (2022-07-29T17:47:45Z)
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images. Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets. Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z)
Robust End-to-End Offline Chinese Handwriting Text Page Spotter with Text Kernel [4.028854207195064]
We propose a robust end-to-end Chinese text page spotter framework. It unifies text detection and text recognition with text kernel. Our method achieves state-of-the-art results on the CASIA-HWDB2.0-2.2 dataset and ICDAR-2013 competition dataset.
arXiv Detail & Related papers (2021-07-04T05:42:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.