Related papers: Bengali Document Layout Analysis with Detectron2

Bengali Document Layout Analysis with Detectron2

URL: http://arxiv.org/abs/2308.13769v1
Date: Sat, 26 Aug 2023 05:29:09 GMT
Title: Bengali Document Layout Analysis with Detectron2
Authors: Md Ataullha and Mahedi Hassan Rabby and Mushfiqur Rahman and Tahsina Bintay Azam
Abstract summary: Document layout analysis involves segmenting documents into meaningful units like text boxes, paragraphs, images, and tables. We improved the accuracy of the DLA model for Bengali documents by utilizing advanced Mask R-CNN models available in the Detectron2 library. Results show the effectiveness of these models in accurately segmenting Bengali documents.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Document digitization is vital for preserving historical records, efficient document management, and advancing OCR (Optical Character Recognition) research. Document Layout Analysis (DLA) involves segmenting documents into meaningful units like text boxes, paragraphs, images, and tables. Challenges arise when dealing with diverse layouts, historical documents, and unique scripts like Bengali, hindered by the lack of comprehensive Bengali DLA datasets. We improved the accuracy of the DLA model for Bengali documents by utilizing advanced Mask R-CNN models available in the Detectron2 library. Our evaluation involved three variants: Mask R-CNN R-50, R-101, and X-101, both with and without pretrained weights from PubLayNet, on the BaDLAD dataset, which contains human-annotated Bengali documents in four categories: text boxes, paragraphs, images, and tables. Results show the effectiveness of these models in accurately segmenting Bengali documents. We discuss speed-accuracy tradeoffs and underscore the significance of pretrained weights. Our findings expand the applicability of Mask R-CNN in document layout analysis, efficient document management, and OCR research while suggesting future avenues for fine-tuning and data augmentation.

Related papers

DREAM: Document Reconstruction via End-to-end Autoregressive Model [53.51754520966657]
We present an innovative autoregressive model specifically designed for document reconstruction, referred to as Document Reconstruction via End-to-end Autoregressive Model (DREAM)<n>We establish a standardized definition of the document reconstruction task, and introduce a novel Document Similarity Metric (DSM) and DocRec1K dataset for assessing the performance of the task.
arXiv Detail & Related papers (2025-07-08T09:24:07Z)
Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs [12.745520645025808]
We benchmark Graph Neural Network (GNN) architectures for the task of fine-grained layout classification of text blocks from digital native documents.<n>Our results demonstrate that GraphSAGE operating on the k-closest-neighbor graph in a dual-branch configuration achieves the highest per-class and overall accuracy.
arXiv Detail & Related papers (2025-05-12T10:59:30Z)
A RAG-Based Institutional Assistant [0.1499944454332829]
We design and evaluate a RAG-based virtual assistant specifically tailored for the University of Sao Paulo. Our optimal retriever model achieves a Top-5 accuracy of 30%, while our most effective generative model scores 22.04% against ground truth answers.
arXiv Detail & Related papers (2025-01-23T17:54:19Z)
Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings. First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss. Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z)
Unifying Multimodal Retrieval via Document Screenshot Embedding [92.03571344075607]
Document Screenshot Embedding (DSE) is a novel retrieval paradigm that regards document screenshots as a unified input format. We first craft the dataset of Wiki-SS, a 1.3M Wikipedia web page screenshots as the corpus to answer the questions from the Natural Questions dataset. In such a text-intensive document retrieval setting, DSE shows competitive effectiveness compared to other text retrieval methods relying on parsing.
arXiv Detail & Related papers (2024-06-17T06:27:35Z)
Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents [31.434507306952458]
We propose KNN-former, which incorporates a new kind of bias in attention calculation based on the K-nearest-neighbor (KNN) graph of document entities. We also use matching spatial to address the one-to-one mapping property that exists in many documents. Our method is highly-efficient compared to existing approaches in terms of the number of trainable parameters.
arXiv Detail & Related papers (2024-05-08T10:10:38Z)
Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach [0.6562256987706128]
We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness. We fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation. Our experiments provided key insights to incorporate new strategies into the established solution.
arXiv Detail & Related papers (2023-09-02T07:17:43Z)
bbOCR: An Open-source Multi-domain OCR Pipeline for Bengali Documents [0.23639235997306196]
We introduce Bengali$.$AI-BRACU-OCR (bbOCR), an open-source scalable document OCR system that can reconstruct Bengali documents into a structured searchable digitized format. Our proposed solution is preferable over the current state-of-the-art Bengali OCR systems.
arXiv Detail & Related papers (2023-08-21T11:35:28Z)
Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout Analysis [0.0]
In the DL Sprint 2.0 competition, we worked on understanding Bangla documents. We used a dataset called BaDLAD with lots of examples. We trained a special model called Mask R-CNN to help with this understanding.
arXiv Detail & Related papers (2023-08-21T06:51:58Z)
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset [1.2015699532079325]
This dataset contains 33,695 human annotated document samples from six domains. We demonstrate the efficacy of our dataset in training deep learning based Bengali document models.
arXiv Detail & Related papers (2023-03-09T15:15:55Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation [49.940525611640346]
Document Augmentation for dense Retrieval (DAR) framework augments the representations of documents with their Dense Augmentation and perturbations. We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.
arXiv Detail & Related papers (2022-03-15T09:07:38Z)
One-shot Key Information Extraction from Document with Deep Partial Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios. Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents. We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z)
DocBank: A Benchmark Dataset for Document Layout Analysis [114.81155155508083]
We present textbfDocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis. Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents.
arXiv Detail & Related papers (2020-06-01T16:04:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.