Related papers: GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System

GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System

URL: http://arxiv.org/abs/2404.14062v1
Date: Mon, 22 Apr 2024 10:19:16 GMT
Title: GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System
Authors: Lalita Kumari, Sukhdeep Singh, Vaibhav Varish Singh Rathore, Anuj Sharma,
Abstract summary: We present an end-to-end paragraph recognition system that incorporates internal line segmentation and convolutional layers based encoder. This study reported character error rates of 2.27% on IAM, 0.9% on RIMES, and 2.13% on READ-16, and word error rates of 5.73% on READ-2016 datasets.
Score: 3.9527064697847005
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The Handwritten Text Recognition problem has been a challenge for researchers for the last few decades, especially in the domain of computer vision, a subdomain of pattern recognition. Variability of texts amongst writers, cursiveness, and different font styles of handwritten texts with degradation of historical text images make it a challenging problem. Recognizing scanned document images in neural network-based systems typically involves a two-step approach: segmentation and recognition. However, this method has several drawbacks. These shortcomings encompass challenges in identifying text regions, analyzing layout diversity within pages, and establishing accurate ground truth segmentation. Consequently, these processes are prone to errors, leading to bottlenecks in achieving high recognition accuracies. Thus, in this study, we present an end-to-end paragraph recognition system that incorporates internal line segmentation and gated convolutional layers based encoder. The gating is a mechanism that controls the flow of information and allows to adaptively selection of the more relevant features in handwritten text recognition models. The attention module plays an important role in performing internal line segmentation, allowing the page to be processed line-by-line. During the decoding step, we have integrated a connectionist temporal classification-based word beam search decoder as a post-processing step. In this work, we have extended existing LexiconNet by carefully applying and utilizing gated convolutional layers in the existing deep neural network. Our results at line and page levels also favour our new GatedLexiconNet. This study reported character error rates of 2.27% on IAM, 0.9% on RIMES, and 2.13% on READ-16, and word error rates of 5.73% on IAM, 2.76% on RIMES, and 6.52% on READ-2016 datasets.

Related papers

Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection [71.59834293521074]
We develop a framework to distinguish between human-authored and machine-generated text.<n>Our method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset.<n>Code, pretrained weights, and demo will be released.
arXiv Detail & Related papers (2025-10-07T08:14:45Z)
HAND: Hierarchical Attention Network for Multi-Scale Handwritten Document Recognition and Layout Analysis [21.25786478579275]
Handwritten document recognition is one of the most challenging tasks in computer vision. Traditionally, this problem has been approached as two separate tasks, handwritten text recognition and layout analysis. This paper introduces HAND, a novel end-to-end and segmentation-free architecture for simultaneous text recognition and layout analysis tasks.
arXiv Detail & Related papers (2024-12-25T20:36:29Z)
Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach [53.189911918976655]
We propose DOLPHIN, a novel retrieval model designed to enhance handwriting representations through synergistic temporal-frequency analysis.<n>We introduce OLIWER, a large-scale online writer retrieval dataset encompassing over 670,000 Chinese handwritten phrases from 1,731 individuals.<n>Our findings emphasize the significance of point sampling frequency and pressure features in improving handwriting representation quality.
arXiv Detail & Related papers (2024-12-16T11:19:22Z)
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting [126.01629300244001]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2. We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules. SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
arXiv Detail & Related papers (2024-01-15T12:33:00Z)
Handwritten and Printed Text Segmentation: A Signature Case Study [0.0]
We develop novel approaches to address the challenges of handwritten and printed text segmentation. Our objective is to recover text from different classes in their entirety, especially enhancing the segmentation performance on overlapping sections. Our best configuration outperforms prior work on two different datasets by 17.9% and 7.3% on IoU scores.
arXiv Detail & Related papers (2023-07-15T21:49:22Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
Lexicon and Attention based Handwritten Text Recognition System [3.9097549127191473]
We have taken two state-of-the art neural networks systems and merged the attention mechanism with it. We are able to achieve 4.15% character error rate and 9.72% word error rate on IAM dataset, 7.07% character error rate and 16.14% word error rate on GW dataset.
arXiv Detail & Related papers (2022-09-11T09:26:45Z)
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter. Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism. The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z)
Continuous Offline Handwriting Recognition using Deep Learning Models [0.0]
Handwritten text recognition is an open problem of great interest in the area of automatic document image analysis. We have proposed a new recognition model based on integrating two types of deep learning architectures: convolutional neural networks (CNN) and sequence-to-sequence (seq2seq) The new proposed model provides competitive results with those obtained with other well-established methodologies.
arXiv Detail & Related papers (2021-12-26T07:31:03Z)
CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition [87.3894423816705]
We propose a novel module called Multi-Domain Character Distance Perception (MDCDP) to establish a visually and semantically related position embedding. MDCDP uses the position embedding to query both visual and semantic features following the cross-attention mechanism. We develop CDistNet that stacks multiple MDCDPs to guide a gradually precise distance modeling.
arXiv Detail & Related papers (2021-11-22T06:27:29Z)
Robust End-to-End Offline Chinese Handwriting Text Page Spotter with Text Kernel [4.028854207195064]
We propose a robust end-to-end Chinese text page spotter framework. It unifies text detection and text recognition with text kernel. Our method achieves state-of-the-art results on the CASIA-HWDB2.0-2.2 dataset and ICDAR-2013 competition dataset.
arXiv Detail & Related papers (2021-07-04T05:42:04Z)
Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs) We compare their accuracy and performance on widely used public datasets of scene and handwritten text. Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z)
TextScanner: Reading Characters in Order for Robust Scene Text Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition. It generates pixel-wise, multi-channel segmentation maps for character class, position and order. It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.