Related papers: Enhanced Hybrid Technique for Efficient Digitization of Handwritten Marksheets

Enhanced Hybrid Technique for Efficient Digitization of Handwritten Marksheets

URL: http://arxiv.org/abs/2508.16295v1
Date: Fri, 22 Aug 2025 10:57:27 GMT
Title: Enhanced Hybrid Technique for Efficient Digitization of Handwritten Marksheets
Authors: Junaid Ahmed Sifat, Abir Chowdhury, Hasnat Md. Imtiaz, Md. Irtiza Hossain, Md. Imran Bin Azad,
Abstract summary: This work introduces a hybrid method that integrates OpenCV for table detection and PaddleOCR for recognizing sequential handwritten text.<n>YOLOv8 and Modified YOLOv8 are implemented for handwritten text recognition within the detected table structures alongside PaddleOCR.<n> Experimental results demonstrate that YOLOv8 Modified achieves an accuracy of 92.72 percent, outperforming PaddleOCR 91.37 percent and the YOLOv8 model 88.91 percent.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The digitization of handwritten marksheets presents huge challenges due to the different styles of handwriting and complex table structures in such documents like marksheets. This work introduces a hybrid method that integrates OpenCV for table detection and PaddleOCR for recognizing sequential handwritten text. The image processing capabilities of OpenCV efficiently detects rows and columns which enable computationally lightweight and accurate table detection. Additionally, YOLOv8 and Modified YOLOv8 are implemented for handwritten text recognition within the detected table structures alongside PaddleOCR which further enhance the system's versatility. The proposed model achieves high accuracy on our custom dataset which is designed to represent different and diverse handwriting styles and complex table layouts. Experimental results demonstrate that YOLOv8 Modified achieves an accuracy of 92.72 percent, outperforming PaddleOCR 91.37 percent and the YOLOv8 model 88.91 percent. This efficiency reduces the necessity for manual work which makes this a practical and fast solution for digitizing academic as well as administrative documents. This research serves the field of document automation, particularly handwritten document understanding, by providing operational and reliable methods to scale, enhance, and integrate the technologies involved.

Related papers

MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns [80.05126590825121]
MonkeyOCR v1.5 is a unified vision-language framework that enhances both layout understanding and content recognition.<n>To address complex table structures, we propose a visual consistency-based reinforcement learning scheme.<n>Two specialized modules, Image-Decoupled Table Parsing and Type-Guided Table Merging, are introduced to enable reliable parsing of tables.
arXiv Detail & Related papers (2025-11-13T15:12:17Z)
HandReader: Advanced Techniques for Efficient Fingerspelling Recognition [75.38606213726906]
This paper introduces HandReader, a group of three architectures designed to address the fingerspelling recognition task.<n>HandReader$_RGB$ employs the novel Adaptive Shift-Temporal Module (TSAM) to process RGB features from videos of varying lengths.<n>HandReader$_KP$ is built on the proposed Temporal Pose (TPE) operated on keypoints as tensors.<n>Each HandReader model possesses distinct advantages and achieves state-of-the-art results on the ChicagoFSWild and ChicagoFSWild+ datasets.
arXiv Detail & Related papers (2025-05-15T13:18:37Z)
Handwritten Digit Recognition: An Ensemble-Based Approach for Superior Performance [9.174021241188143]
This paper presents an ensemble-based approach that combines Convolutional Neural Networks (CNNs) with traditional machine learning techniques to improve recognition accuracy and robustness.<n>We evaluate our method on the MNIST dataset, comprising 70,000 handwritten digit images.<n>Our hybrid model, which uses CNNs for feature extraction and Support Vector Machines (SVMs) for classification, achieves an accuracy of 99.30%.
arXiv Detail & Related papers (2025-03-08T07:09:49Z)
HAND: Hierarchical Attention Network for Multi-Scale Handwritten Document Recognition and Layout Analysis [21.25786478579275]
Handwritten document recognition is one of the most challenging tasks in computer vision.<n>Traditionally, this problem has been approached as two separate tasks, handwritten text recognition and layout analysis.<n>This paper introduces HAND, a novel end-to-end and segmentation-free architecture for simultaneous text recognition and layout analysis tasks.
arXiv Detail & Related papers (2024-12-25T20:36:29Z)
TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content [39.34067105360439]
We propose an end-to-end pipeline that integrates deep learning models, including DETR, CascadeTabNet, and PP OCR v2, to achieve comprehensive image-based table recognition. Our system achieves simultaneous table detection (TD), table structure recognition (TSR), and table content recognition (TCR) Our proposed approach achieves an IOU of 0.96 and an OCR Accuracy of 78%, showcasing a remarkable improvement of approximately 25% in the OCR Accuracy compared to the previous Table Transformer approach.
arXiv Detail & Related papers (2024-04-16T06:24:53Z)
Enhancement of Bengali OCR by Specialized Models and Advanced Techniques for Diverse Document Types [1.2499537119440245]
This research paper presents a unique Bengali OCR system with some capabilities. The system excels in reconstructing document layouts while preserving structure, alignment, and images. Specialized models for word segmentation cater to diverse document types, including computer-composed, letterpress, typewriter, and handwritten documents.
arXiv Detail & Related papers (2024-02-07T18:02:33Z)
Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions [52.250269529057014]
Handwritten Text Recognition (HTR) in free-volution pages is a challenging image understanding task. We propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text.
arXiv Detail & Related papers (2022-08-17T06:55:54Z)
Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents. Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages. We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z)
One-shot Key Information Extraction from Document with Deep Partial Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios. Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents. We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z)
Multi-Type-TD-TSR -- Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition: from OCR to Structured Table Representations [63.98463053292982]
The recognition of tables consists of two main tasks, namely table detection and table structure recognition. Recent work shows a clear trend towards deep learning approaches coupled with the use of transfer learning for the task of table structure recognition. We present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for the problem of table recognition.
arXiv Detail & Related papers (2021-05-23T21:17:18Z)
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding [49.941806975280045]
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks. We present text-bfLMv2 by pre-training text, layout and image in a multi-modal framework.
arXiv Detail & Related papers (2020-12-29T13:01:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.