Arabic Handwritten Document OCR Solution with Binarization and Adaptive Scale Fusion Detection
- URL: http://arxiv.org/abs/2412.01601v1
- Date: Mon, 02 Dec 2024 15:21:09 GMT
- Title: Arabic Handwritten Document OCR Solution with Binarization and Adaptive Scale Fusion Detection
- Authors: Alhossien Waly, Bassant Tarek, Ali Feteha, Rewan Yehia, Gasser Amr, Ahmed Fares,
- Abstract summary: We present a complete OCR pipeline that starts with line segmentation and Adaptive Scale Fusion techniques to ensure accurate detection of text lines.
Our system, trained on the Arabic Multi-Fonts dataset, achieves a Character Recognition Rate (CRR) of 99.20% and a Word Recognition Rate (WRR) of 93.75% on single-word samples containing 7 to 10 characters.
- Score: 1.1655046053160683
- License:
- Abstract: The problem of converting images of text into plain text is a widely researched topic in both academia and industry. Arabic handwritten Text Recognation (AHTR) poses additional challenges due to diverse handwriting styles and limited labeled data. In this paper we present a complete OCR pipeline that starts with line segmentation using Differentiable Binarization and Adaptive Scale Fusion techniques to ensure accurate detection of text lines. Following segmentation, a CNN-BiLSTM-CTC architecture is applied to recognize characters. Our system, trained on the Arabic Multi-Fonts Dataset (AMFDS), achieves a Character Recognition Rate (CRR) of 99.20% and a Word Recognition Rate (WRR) of 93.75% on single-word samples containing 7 to 10 characters, along with a CRR of 83.76% for sentences. These results demonstrate the system's strong performance in handling Arabic scripts, establishing a new benchmark for AHTR systems.
Related papers
- Invizo: Arabic Handwritten Document Optical Character Recognition Solution [2.5819726282014654]
This work proposes an end-to-end solution for recognizing Arabic handwritten, printed, and Arabic numbers.
We reached 81.66% precision, 78.82% Recall, and 79.07% F-measure on a Text Detection task that powers the proposed solution.
arXiv Detail & Related papers (2025-02-07T19:25:33Z) - Out of Length Text Recognition with Sub-String Matching [54.63761108308825]
In this paper, we term this task Out of Length (OOL) text recognition.
We propose a novel method called OOL Text Recognition with sub-String Matching (SMTR)
SMTR comprises two cross-attention-based modules: one encodes a sub-string containing multiple characters into next and previous queries, and the other employs the queries to attend to the image features.
arXiv Detail & Related papers (2024-07-17T05:02:17Z) - GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System [3.9527064697847005]
We present an end-to-end paragraph recognition system that incorporates internal line segmentation and convolutional layers based encoder.
This study reported character error rates of 2.27% on IAM, 0.9% on RIMES, and 2.13% on READ-16, and word error rates of 5.73% on READ-2016 datasets.
arXiv Detail & Related papers (2024-04-22T10:19:16Z) - LOCR: Location-Guided Transformer for Optical Character Recognition [55.195165959662795]
We propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression.
We train the model on a dataset comprising over 77M text-location pairs from 125K academic document pages, including bounding boxes for words, tables and mathematical symbols.
It outperforms all existing methods in our test set constructed from arXiv, as measured by edit distance, BLEU, METEOR and F-measure.
arXiv Detail & Related papers (2024-03-04T15:34:12Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs)
We compare their accuracy and performance on widely used public datasets of scene and handwritten text.
Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z) - Arabic Handwritten Character Recognition based on Convolution Neural
Networks and Support Vector Machine [0.0]
We present an algorithm for recognizing Arabic letters and characters based on using deep convolution neural networks (DCNN) and support vector machine (SVM)
This paper addresses the problem of recognizing the Arabic handwritten characters by determining the similarity between the input templates and the pre-stored templates.
The experimental results of this work indicate the ability of the proposed algorithm to recognize, identify, and verify the input handwritten Arabic characters.
arXiv Detail & Related papers (2020-09-28T16:18:52Z) - An Efficient Language-Independent Multi-Font OCR for Arabic Script [0.0]
This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document.
This paper also proposes an improved font-independent character algorithm that outperforms the state-of-the-art segmentation algorithms.
arXiv Detail & Related papers (2020-09-18T22:57:03Z) - Offline Handwritten Chinese Text Recognition with Convolutional Neural
Networks [5.984124397831814]
In this paper, we build the models using only the convolutional neural networks and use CTC as the loss function.
We achieve 6.81% character error rate (CER) on the ICDAR 2013 competition set, which is the best published result without language model correction.
arXiv Detail & Related papers (2020-06-28T14:34:38Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.