GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System
- URL: http://arxiv.org/abs/2404.14062v1
- Date: Mon, 22 Apr 2024 10:19:16 GMT
- Title: GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System
- Authors: Lalita Kumari, Sukhdeep Singh, Vaibhav Varish Singh Rathore, Anuj Sharma,
- Abstract summary: We present an end-to-end paragraph recognition system that incorporates internal line segmentation and convolutional layers based encoder.
This study reported character error rates of 2.27% on IAM, 0.9% on RIMES, and 2.13% on READ-16, and word error rates of 5.73% on READ-2016 datasets.
- Score: 3.9527064697847005
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The Handwritten Text Recognition problem has been a challenge for researchers for the last few decades, especially in the domain of computer vision, a subdomain of pattern recognition. Variability of texts amongst writers, cursiveness, and different font styles of handwritten texts with degradation of historical text images make it a challenging problem. Recognizing scanned document images in neural network-based systems typically involves a two-step approach: segmentation and recognition. However, this method has several drawbacks. These shortcomings encompass challenges in identifying text regions, analyzing layout diversity within pages, and establishing accurate ground truth segmentation. Consequently, these processes are prone to errors, leading to bottlenecks in achieving high recognition accuracies. Thus, in this study, we present an end-to-end paragraph recognition system that incorporates internal line segmentation and gated convolutional layers based encoder. The gating is a mechanism that controls the flow of information and allows to adaptively selection of the more relevant features in handwritten text recognition models. The attention module plays an important role in performing internal line segmentation, allowing the page to be processed line-by-line. During the decoding step, we have integrated a connectionist temporal classification-based word beam search decoder as a post-processing step. In this work, we have extended existing LexiconNet by carefully applying and utilizing gated convolutional layers in the existing deep neural network. Our results at line and page levels also favour our new GatedLexiconNet. This study reported character error rates of 2.27% on IAM, 0.9% on RIMES, and 2.13% on READ-16, and word error rates of 5.73% on IAM, 2.76% on RIMES, and 6.52% on READ-2016 datasets.
Related papers
- SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting [126.01629300244001]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2.
We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules.
SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
arXiv Detail & Related papers (2024-01-15T12:33:00Z) - Handwritten and Printed Text Segmentation: A Signature Case Study [0.0]
We develop novel approaches to address the challenges of handwritten and printed text segmentation.
Our objective is to recover text from different classes in their entirety, especially enhancing the segmentation performance on overlapping sections.
Our best configuration outperforms prior work on two different datasets by 17.9% and 7.3% on IoU scores.
arXiv Detail & Related papers (2023-07-15T21:49:22Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Lexicon and Attention based Handwritten Text Recognition System [3.9097549127191473]
We have taken two state-of-the art neural networks systems and merged the attention mechanism with it.
We are able to achieve 4.15% character error rate and 9.72% word error rate on IAM dataset, 7.07% character error rate and 16.14% word error rate on GW dataset.
arXiv Detail & Related papers (2022-09-11T09:26:45Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - Continuous Offline Handwriting Recognition using Deep Learning Models [0.0]
Handwritten text recognition is an open problem of great interest in the area of automatic document image analysis.
We have proposed a new recognition model based on integrating two types of deep learning architectures: convolutional neural networks (CNN) and sequence-to-sequence (seq2seq)
The new proposed model provides competitive results with those obtained with other well-established methodologies.
arXiv Detail & Related papers (2021-12-26T07:31:03Z) - CDistNet: Perceiving Multi-Domain Character Distance for Robust Text
Recognition [87.3894423816705]
We propose a novel module called Multi-Domain Character Distance Perception (MDCDP) to establish a visually and semantically related position embedding.
MDCDP uses the position embedding to query both visual and semantic features following the cross-attention mechanism.
We develop CDistNet that stacks multiple MDCDPs to guide a gradually precise distance modeling.
arXiv Detail & Related papers (2021-11-22T06:27:29Z) - Robust End-to-End Offline Chinese Handwriting Text Page Spotter with
Text Kernel [4.028854207195064]
We propose a robust end-to-end Chinese text page spotter framework.
It unifies text detection and text recognition with text kernel.
Our method achieves state-of-the-art results on the CASIA-HWDB2.0-2.2 dataset and ICDAR-2013 competition dataset.
arXiv Detail & Related papers (2021-07-04T05:42:04Z) - Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs)
We compare their accuracy and performance on widely used public datasets of scene and handwritten text.
Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.