UTRNet: High-Resolution Urdu Text Recognition In Printed Documents
- URL: http://arxiv.org/abs/2306.15782v3
- Date: Wed, 23 Aug 2023 10:02:15 GMT
- Title: UTRNet: High-Resolution Urdu Text Recognition In Printed Documents
- Authors: Abdur Rahman, Arjun Ghosh, and Chetan Arora
- Abstract summary: We propose a novel approach to address the challenges of printed Urdu text recognition.
Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets.
- Score: 5.179738379203527
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we propose a novel approach to address the challenges of
printed Urdu text recognition using high-resolution, multi-scale semantic
feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model,
demonstrates state-of-the-art performance on benchmark datasets. To address the
limitations of previous works, which struggle to generalize to the intricacies
of the Urdu script and the lack of sufficient annotated real-world data, we
have introduced the UTRSet-Real, a large-scale annotated real-world dataset
comprising over 11,000 lines and UTRSet-Synth, a synthetic dataset with 20,000
lines closely resembling real-world and made corrections to the ground truth of
the existing IIITH dataset, making it a more reliable resource for future
research. We also provide UrduDoc, a benchmark dataset for Urdu text line
detection in scanned documents. Additionally, we have developed an online tool
for end-to-end Urdu OCR from printed documents by integrating UTRNet with a
text detection model. Our work not only addresses the current limitations of
Urdu OCR but also paves the way for future research in this area and
facilitates the continued advancement of Urdu OCR technology. The project page
with source code, datasets, annotations, trained models, and online tool is
available at abdur75648.github.io/UTRNet.
Related papers
- Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.
We introduce novel methodologies and datasets to overcome these challenges.
We propose MhBART, an encoder-decoder model designed to emulate human writing style.
We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text [2.2012643583422347]
This research paper introduces a novel word-level Optical Character Recognition (OCR) model specifically designed for digital Urdu text.
The model employs a permuted autoregressive sequence (PARSeq) architecture, which enhances its performance.
The model demonstrates a high level of accuracy in capturing the intricacies of Urdu script, achieving a CER of 0.178.
arXiv Detail & Related papers (2024-08-27T14:58:13Z) - SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding [23.910783272007407]
This paper introduces SynthDoc, a novel synthetic document generation pipeline designed to enhance Visual Document Understanding (VDU)
Addressing the challenges of data acquisition and the limitations of existing datasets, SynthDoc leverages publicly available corpora and advanced rendering tools to create a comprehensive and versatile dataset.
Our experiments, conducted using the Donut model, demonstrate that models trained with SynthDoc's data achieve superior performance in pre-training read tasks and maintain robustness in downstream tasks, despite language inconsistencies.
arXiv Detail & Related papers (2024-08-27T03:31:24Z) - mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval [67.50604814528553]
We first introduce a text encoder enhanced with RoPE and unpadding, pre-trained in a native 8192-token context.
Then we construct a hybrid TRM and a cross-encoder reranker by contrastive learning.
arXiv Detail & Related papers (2024-07-29T03:12:28Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Towards Deployable OCR models for Indic languages [33.6638069612644]
Modelling unsegmented sequences using Connectionist Temporal Classification (CTC) is the most commonly used approach for segmentation-free OCR.
We present a study of various neural network models that uses CTC for transcribing step-wise predictions in the neural network output to a Unicode sequence.
We also introduce a new public dataset called Mozhi for word and line recognition in Indian language.
arXiv Detail & Related papers (2022-05-13T16:19:21Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - HCR-Net: A deep learning based script independent handwritten character
recognition network [5.8067395321424975]
Handwritten character recognition (HCR) remains a challenging pattern recognition problem despite decades of research.
We have proposed a script independent deep learning network for HCR research, called HCR-Net, that sets a new research direction for the field.
arXiv Detail & Related papers (2021-08-15T05:48:07Z) - Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs)
We compare their accuracy and performance on widely used public datasets of scene and handwritten text.
Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z) - EASTER: Efficient and Scalable Text Recognizer [0.0]
We present an Efficient And Scalable TExt Recognizer (EASTER) to perform optical character recognition on both machine printed and handwritten text.
Our model utilise 1-D convolutional layers without any recurrence which enables parallel training with considerably less volume of data.
We also showcase improvements over the current best results on offline handwritten text recognition task.
arXiv Detail & Related papers (2020-08-18T10:26:03Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.