Bengali Document Layout Analysis with Detectron2
- URL: http://arxiv.org/abs/2308.13769v1
- Date: Sat, 26 Aug 2023 05:29:09 GMT
- Title: Bengali Document Layout Analysis with Detectron2
- Authors: Md Ataullha and Mahedi Hassan Rabby and Mushfiqur Rahman and Tahsina
Bintay Azam
- Abstract summary: Document layout analysis involves segmenting documents into meaningful units like text boxes, paragraphs, images, and tables.
We improved the accuracy of the DLA model for Bengali documents by utilizing advanced Mask R-CNN models available in the Detectron2 library.
Results show the effectiveness of these models in accurately segmenting Bengali documents.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Document digitization is vital for preserving historical records, efficient
document management, and advancing OCR (Optical Character Recognition)
research. Document Layout Analysis (DLA) involves segmenting documents into
meaningful units like text boxes, paragraphs, images, and tables. Challenges
arise when dealing with diverse layouts, historical documents, and unique
scripts like Bengali, hindered by the lack of comprehensive Bengali DLA
datasets. We improved the accuracy of the DLA model for Bengali documents by
utilizing advanced Mask R-CNN models available in the Detectron2 library. Our
evaluation involved three variants: Mask R-CNN R-50, R-101, and X-101, both
with and without pretrained weights from PubLayNet, on the BaDLAD dataset,
which contains human-annotated Bengali documents in four categories: text
boxes, paragraphs, images, and tables. Results show the effectiveness of these
models in accurately segmenting Bengali documents. We discuss speed-accuracy
tradeoffs and underscore the significance of pretrained weights. Our findings
expand the applicability of Mask R-CNN in document layout analysis, efficient
document management, and OCR research while suggesting future avenues for
fine-tuning and data augmentation.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Unifying Multimodal Retrieval via Document Screenshot Embedding [92.03571344075607]
Document Screenshot Embedding (DSE) is a novel retrieval paradigm that regards document screenshots as a unified input format.
We first craft the dataset of Wiki-SS, a 1.3M Wikipedia web page screenshots as the corpus to answer the questions from the Natural Questions dataset.
In such a text-intensive document retrieval setting, DSE shows competitive effectiveness compared to other text retrieval methods relying on parsing.
arXiv Detail & Related papers (2024-06-17T06:27:35Z) - Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents [31.434507306952458]
We propose KNN-former, which incorporates a new kind of bias in attention calculation based on the K-nearest-neighbor (KNN) graph of document entities.
We also use matching spatial to address the one-to-one mapping property that exists in many documents.
Our method is highly-efficient compared to existing approaches in terms of the number of trainable parameters.
arXiv Detail & Related papers (2024-05-08T10:10:38Z) - Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach [0.6562256987706128]
We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness.
We fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation.
Our experiments provided key insights to incorporate new strategies into the established solution.
arXiv Detail & Related papers (2023-09-02T07:17:43Z) - bbOCR: An Open-source Multi-domain OCR Pipeline for Bengali Documents [0.23639235997306196]
We introduce Bengali$.$AI-BRACU-OCR (bbOCR), an open-source scalable document OCR system that can reconstruct Bengali documents into a structured searchable digitized format.
Our proposed solution is preferable over the current state-of-the-art Bengali OCR systems.
arXiv Detail & Related papers (2023-08-21T11:35:28Z) - Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout
Analysis [0.0]
In the DL Sprint 2.0 competition, we worked on understanding Bangla documents.
We used a dataset called BaDLAD with lots of examples.
We trained a special model called Mask R-CNN to help with this understanding.
arXiv Detail & Related papers (2023-08-21T06:51:58Z) - BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset [1.2015699532079325]
This dataset contains 33,695 human annotated document samples from six domains.
We demonstrate the efficacy of our dataset in training deep learning based Bengali document models.
arXiv Detail & Related papers (2023-03-09T15:15:55Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Augmenting Document Representations for Dense Retrieval with
Interpolation and Perturbation [49.940525611640346]
Document Augmentation for dense Retrieval (DAR) framework augments the representations of documents with their Dense Augmentation and perturbations.
We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.
arXiv Detail & Related papers (2022-03-15T09:07:38Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - DocBank: A Benchmark Dataset for Document Layout Analysis [114.81155155508083]
We present textbfDocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis.
Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents.
arXiv Detail & Related papers (2020-06-01T16:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.