Enhancement of Bengali OCR by Specialized Models and Advanced Techniques
for Diverse Document Types
- URL: http://arxiv.org/abs/2402.05158v1
- Date: Wed, 7 Feb 2024 18:02:33 GMT
- Title: Enhancement of Bengali OCR by Specialized Models and Advanced Techniques
for Diverse Document Types
- Authors: AKM Shahariar Azad Rabby, Hasmot Ali, Md. Majedul Islam, Sheikh
Abujar, Fuad Rahman
- Abstract summary: This research paper presents a unique Bengali OCR system with some capabilities.
The system excels in reconstructing document layouts while preserving structure, alignment, and images.
Specialized models for word segmentation cater to diverse document types, including computer-composed, letterpress, typewriter, and handwritten documents.
- Score: 1.2499537119440245
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This research paper presents a unique Bengali OCR system with some
capabilities. The system excels in reconstructing document layouts while
preserving structure, alignment, and images. It incorporates advanced image and
signature detection for accurate extraction. Specialized models for word
segmentation cater to diverse document types, including computer-composed,
letterpress, typewriter, and handwritten documents. The system handles static
and dynamic handwritten inputs, recognizing various writing styles.
Furthermore, it has the ability to recognize compound characters in Bengali.
Extensive data collection efforts provide a diverse corpus, while advanced
technical components optimize character and word recognition. Additional
contributions include image, logo, signature and table recognition, perspective
correction, layout reconstruction, and a queuing module for efficient and
scalable processing. The system demonstrates outstanding performance in
efficient and accurate text extraction and analysis.
Related papers
- UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model.
We show that UNIT significantly outperforms existing methods on document-related tasks.
Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z) - DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documents [4.298545628576284]
We introduce DANIEL (Document Attention Network for Information Extraction and Labelling), a fully end-to-end architecture for handwritten document understanding.
DANIEL performs layout recognition, handwriting recognition, and named entity recognition on full-page documents.
It can simultaneously learn across multiple languages, layouts, and tasks.
arXiv Detail & Related papers (2024-07-12T09:09:56Z) - Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model.
A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances.
Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z) - Visually Guided Generative Text-Layout Pre-training for Document Intelligence [51.09853181377696]
We propose visually guided generative text-pre-training, named ViTLP.
Given a document image, the model optimize hierarchical language and layout modeling objectives to generate the interleaved text and layout sequence.
ViTLP can function as a native OCR model to localize and recognize texts of document images.
arXiv Detail & Related papers (2024-03-25T08:00:43Z) - Optimization of Image Processing Algorithms for Character Recognition in
Cultural Typewritten Documents [0.8158530638728501]
This paper evaluates the impact of image processing methods and parameter tuning in Optical Character Recognition (OCR)
The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II)
Our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results.
arXiv Detail & Related papers (2023-11-27T11:44:46Z) - Boosting Modern and Historical Handwritten Text Recognition with
Deformable Convolutions [52.250269529057014]
Handwritten Text Recognition (HTR) in free-volution pages is a challenging image understanding task.
We propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text.
arXiv Detail & Related papers (2022-08-17T06:55:54Z) - Information Extraction from Scanned Invoice Images using Text Analysis
and Layout Features [0.0]
OCRMiner is designed to process documents in a similar way a human reader uses, i.e. to employ different layout and text attributes in a coordinated decision.
The system is able to recover the invoice data in 90% for English and in 88% for the Czech set.
arXiv Detail & Related papers (2022-08-08T09:46:33Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - DocBed: A Multi-Stage OCR Solution for Documents with Complex Layouts [2.885058600042882]
This work releases a dataset of 3000 fully-annotated, real-world newspaper images from 21 different U.S. states.
It proposes layout segmentation as a precursor to existing optical character recognition (OCR) engines.
It provides a thorough and structured evaluation protocol for isolated layout segmentation and end-to-end OCR.
arXiv Detail & Related papers (2022-02-03T05:21:31Z) - DOC2PPT: Automatic Presentation Slides Generation from Scientific
Documents [76.19748112897177]
We present a novel task and approach for document-to-slide generation.
We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner.
Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides.
arXiv Detail & Related papers (2021-01-28T03:21:17Z) - An Efficient Language-Independent Multi-Font OCR for Arabic Script [0.0]
This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document.
This paper also proposes an improved font-independent character algorithm that outperforms the state-of-the-art segmentation algorithms.
arXiv Detail & Related papers (2020-09-18T22:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.