Related papers: Arabic Handwritten Document OCR Solution with Binarization and Adaptive Scale Fusion Detection

Arabic Handwritten Document OCR Solution with Binarization and Adaptive Scale Fusion Detection

URL: http://arxiv.org/abs/2412.01601v1
Date: Mon, 02 Dec 2024 15:21:09 GMT
Title: Arabic Handwritten Document OCR Solution with Binarization and Adaptive Scale Fusion Detection
Authors: Alhossien Waly, Bassant Tarek, Ali Feteha, Rewan Yehia, Gasser Amr, Ahmed Fares,
Abstract summary: We present a complete OCR pipeline that starts with line segmentation and Adaptive Scale Fusion techniques to ensure accurate detection of text lines.<n>Our system, trained on the Arabic Multi-Fonts dataset, achieves a Character Recognition Rate (CRR) of 99.20% and a Word Recognition Rate (WRR) of 93.75% on single-word samples containing 7 to 10 characters.
Score: 1.1655046053160683
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The problem of converting images of text into plain text is a widely researched topic in both academia and industry. Arabic handwritten Text Recognation (AHTR) poses additional challenges due to diverse handwriting styles and limited labeled data. In this paper we present a complete OCR pipeline that starts with line segmentation using Differentiable Binarization and Adaptive Scale Fusion techniques to ensure accurate detection of text lines. Following segmentation, a CNN-BiLSTM-CTC architecture is applied to recognize characters. Our system, trained on the Arabic Multi-Fonts Dataset (AMFDS), achieves a Character Recognition Rate (CRR) of 99.20% and a Word Recognition Rate (WRR) of 93.75% on single-word samples containing 7 to 10 characters, along with a CRR of 83.76% for sentences. These results demonstrate the system's strong performance in handling Arabic scripts, establishing a new benchmark for AHTR systems.

Related papers

QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation [0.8944616102795021]
We present Qari-OCR, a vision-language models progressively optimized for Arabic.<n>Qari-OCR establishes a new open-source state-of-the-art with a Word Error Rate (WER) of 0.160, Character Error Rate (CER) of 0.061, and BLEU score of 0.737 on diacritically-rich texts.
arXiv Detail & Related papers (2025-06-02T22:21:06Z)
Bridging the Gap: Fusing CNNs and Transformers to Decode the Elegance of Handwritten Arabic Script [0.0]
Handwritten Arabic script recognition is a challenging task due to the script's dynamic letter forms and contextual variations. This paper proposes a hybrid approach combining convolutional neural networks (CNNs) and Transformer-based architectures to address these complexities. Our ensemble achieves remarkable performance on the IFN/ENIT dataset, with 96.38% accuracy for letter classification and 97.22% for positional classification.
arXiv Detail & Related papers (2025-03-19T09:20:42Z)
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding [24.9462694200992]
KITAB-Bench is a comprehensive Arabic OCR benchmark that fills the gaps in current evaluation systems. Modern vision-language models (such as GPT-4, Gemini, and Qwen) outperform traditional OCR approaches by an average of 60% in Character Error Rate (CER) This work establishes a rigorous evaluation framework that can drive improvements in Arabic document analysis methods.
arXiv Detail & Related papers (2025-02-20T18:41:23Z)
Invizo: Arabic Handwritten Document Optical Character Recognition Solution [2.5819726282014654]
This work proposes an end-to-end solution for recognizing Arabic handwritten, printed, and Arabic numbers. We reached 81.66% precision, 78.82% Recall, and 79.07% F-measure on a Text Detection task that powers the proposed solution.
arXiv Detail & Related papers (2025-02-07T19:25:33Z)
Out of Length Text Recognition with Sub-String Matching [54.63761108308825]
In this paper, we term this task Out of Length (OOL) text recognition. We propose a novel method called OOL Text Recognition with sub-String Matching (SMTR) SMTR comprises two cross-attention-based modules: one encodes a sub-string containing multiple characters into next and previous queries, and the other employs the queries to attend to the image features.
arXiv Detail & Related papers (2024-07-17T05:02:17Z)
GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System [3.9527064697847005]
We present an end-to-end paragraph recognition system that incorporates internal line segmentation and convolutional layers based encoder. This study reported character error rates of 2.27% on IAM, 0.9% on RIMES, and 2.13% on READ-16, and word error rates of 5.73% on READ-2016 datasets.
arXiv Detail & Related papers (2024-04-22T10:19:16Z)
LOCR: Location-Guided Transformer for Optical Character Recognition [55.195165959662795]
We propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression. We train the model on a dataset comprising over 77M text-location pairs from 125K academic document pages, including bounding boxes for words, tables and mathematical symbols. It outperforms all existing methods in our test set constructed from arXiv, as measured by edit distance, BLEU, METEOR and F-measure.
arXiv Detail & Related papers (2024-03-04T15:34:12Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs) We compare their accuracy and performance on widely used public datasets of scene and handwritten text. Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z)
Arabic Handwritten Character Recognition based on Convolution Neural Networks and Support Vector Machine [0.0]
We present an algorithm for recognizing Arabic letters and characters based on using deep convolution neural networks (DCNN) and support vector machine (SVM) This paper addresses the problem of recognizing the Arabic handwritten characters by determining the similarity between the input templates and the pre-stored templates. The experimental results of this work indicate the ability of the proposed algorithm to recognize, identify, and verify the input handwritten Arabic characters.
arXiv Detail & Related papers (2020-09-28T16:18:52Z)
An Efficient Language-Independent Multi-Font OCR for Arabic Script [0.0]
This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document. This paper also proposes an improved font-independent character algorithm that outperforms the state-of-the-art segmentation algorithms.
arXiv Detail & Related papers (2020-09-18T22:57:03Z)
Offline Handwritten Chinese Text Recognition with Convolutional Neural Networks [5.984124397831814]
In this paper, we build the models using only the convolutional neural networks and use CTC as the loss function. We achieve 6.81% character error rate (CER) on the ICDAR 2013 competition set, which is the best published result without language model correction.
arXiv Detail & Related papers (2020-06-28T14:34:38Z)
Neural Computing for Online Arabic Handwriting Character Recognition using Hard Stroke Features Mining [0.0]
An enhanced method of detecting the desired critical points from vertical and horizontal direction-length of handwriting stroke features of online Arabic script recognition is proposed. A minimum feature set is extracted from these tokens for classification of characters using a multilayer perceptron with a back-propagation learning algorithm and modified sigmoid function-based activation function. The proposed method achieves an average accuracy of 98.6% comparable in state of art character recognition techniques.
arXiv Detail & Related papers (2020-05-02T23:17:08Z)
Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron. It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information. Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
TextScanner: Reading Characters in Order for Robust Scene Text Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition. It generates pixel-wise, multi-channel segmentation maps for character class, position and order. It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.