PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten
Chinese Text Recognition
- URL: http://arxiv.org/abs/2207.14807v1
- Date: Fri, 29 Jul 2022 17:47:45 GMT
- Title: PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten
Chinese Text Recognition
- Authors: Dezhi Peng, Lianwen Jin, Yuliang Liu, Canjie Luo, Songxuan Lai
- Abstract summary: We propose PageNet for end-to-end weakly supervised page-level HCTR.
PageNet detects and recognizes characters and predicts the reading order between them.
It can still output detection and recognition results at both the character and line levels.
- Score: 44.70246958636773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Handwritten Chinese text recognition (HCTR) has been an active research topic
for decades. However, most previous studies solely focus on the recognition of
cropped text line images, ignoring the error caused by text line detection in
real-world applications. Although some approaches aimed at page-level text
recognition have been proposed in recent years, they either are limited to
simple layouts or require very detailed annotations including expensive
line-level and even character-level bounding boxes. To this end, we propose
PageNet for end-to-end weakly supervised page-level HCTR. PageNet detects and
recognizes characters and predicts the reading order between them, which is
more robust and flexible when dealing with complex layouts including
multi-directional and curved text lines. Utilizing the proposed weakly
supervised learning framework, PageNet requires only transcripts to be
annotated for real data; however, it can still output detection and recognition
results at both the character and line levels, avoiding the labor and cost of
labeling bounding boxes of characters and text lines. Extensive experiments
conducted on five datasets demonstrate the superiority of PageNet over existing
weakly supervised and fully supervised page-level methods. These experimental
results may spark further research beyond the realms of existing methods based
on connectionist temporal classification or attention. The source code is
available at https://github.com/shannanyinxiang/PageNet.
Related papers
- General Detection-based Text Line Recognition [15.761142324480165]
We introduce a general detection-based approach to text line recognition, be it printed (OCR) or handwritten (HTR)
Our approach builds on a completely different paradigm than state-of-the-art HTR methods, which rely on autoregressive decoding.
We improve state-of-the-art performances for Chinese script recognition on the CASIA v2 dataset, and for cipher recognition on the Borg and Copiale datasets.
arXiv Detail & Related papers (2024-09-25T17:05:55Z) - SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting [126.01629300244001]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2.
We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules.
SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
arXiv Detail & Related papers (2024-01-15T12:33:00Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer [21.479222207347238]
We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting.
TTS is trained with both fully- and weakly-supervised settings.
trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.
arXiv Detail & Related papers (2022-02-11T08:50:09Z) - Implicit Feature Alignment: Learn to Convert Text Recognizer to Text
Spotter [38.4211220941874]
We propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA)
IFA can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference.
We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks.
arXiv Detail & Related papers (2021-06-10T17:06:28Z) - Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection.
It is a general labeling method for texts with various shapes and requires low labeling costs.
Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.