Framework and Model Analysis on Bengali Document Layout Analysis
Dataset: BaDLAD
- URL: http://arxiv.org/abs/2309.16700v1
- Date: Tue, 15 Aug 2023 07:52:24 GMT
- Title: Framework and Model Analysis on Bengali Document Layout Analysis
Dataset: BaDLAD
- Authors: Kazi Reyazul Hasan (1), Mubasshira Musarrat (1), Sadif Ahmed (1) and
Shahriar Raj (1) ((1) Bangladesh University of Engineering and Technology)
- Abstract summary: This study focuses on understanding Bengali Document Layouts using advanced computer programs: Detectron2, YOLOv8, and SAM.
By comparing their accuracy and speed, we learned which one is good for different types of documents.
- Score: 0.7925493098304448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study focuses on understanding Bengali Document Layouts using advanced
computer programs: Detectron2, YOLOv8, and SAM. We looked at lots of different
Bengali documents in our study. Detectron2 is great at finding and separating
different parts of documents, like text boxes and paragraphs. YOLOv8 is good at
figuring out different tables and pictures. We also tried SAM, which helps us
understand tricky layouts. We tested these programs to see how well they work.
By comparing their accuracy and speed, we learned which one is good for
different types of documents. Our research helps make sense of complex layouts
in Bengali documents and can be useful for other languages too.
Related papers
- OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text [112.60163342249682]
We introduce OmniCorpus, a 10 billion-scale image-text interleaved dataset.
Our dataset has 15 times larger scales while maintaining good data quality.
We hope this could provide a solid data foundation for future multimodal model research.
arXiv Detail & Related papers (2024-06-12T17:01:04Z) - What's In My Big Data? [67.04525616289949]
We propose What's In My Big Data? (WIMBD), a platform and a set of sixteen analyses that allow us to reveal and compare the contents of large text corpora.
WIMBD builds on two basic capabilities -- count and search -- at scale, which allows us to analyze more than 35 terabytes on a standard compute node.
Our analysis uncovers several surprising and previously undocumented findings about these corpora, including the high prevalence of duplicate, synthetic, and low-quality content.
arXiv Detail & Related papers (2023-10-31T17:59:38Z) - Bengali Document Layout Analysis with Detectron2 [0.0]
Document layout analysis involves segmenting documents into meaningful units like text boxes, paragraphs, images, and tables.
We improved the accuracy of the DLA model for Bengali documents by utilizing advanced Mask R-CNN models available in the Detectron2 library.
Results show the effectiveness of these models in accurately segmenting Bengali documents.
arXiv Detail & Related papers (2023-08-26T05:29:09Z) - Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout
Analysis [0.0]
In the DL Sprint 2.0 competition, we worked on understanding Bangla documents.
We used a dataset called BaDLAD with lots of examples.
We trained a special model called Mask R-CNN to help with this understanding.
arXiv Detail & Related papers (2023-08-21T06:51:58Z) - Document Layout Annotation: Database and Benchmark in the Domain of
Public Affairs [62.38140271294419]
We propose a procedure to semi-automatically annotate digital documents with different layout labels.
We collect a novel database for DLA in the public affairs domain using a set of 24 data sources from the Spanish Administration.
The results of our experiments validate the proposed text labeling procedure with accuracy up to 99%.
arXiv Detail & Related papers (2023-06-12T08:21:50Z) - Are Layout-Infused Language Models Robust to Layout Distribution Shifts?
A Case Study with Scientific Documents [54.744701806413204]
Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers.
We test whether layout-infused LMs are robust to layout distribution shifts.
arXiv Detail & Related papers (2023-06-01T18:01:33Z) - BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset [1.2015699532079325]
This dataset contains 33,695 human annotated document samples from six domains.
We demonstrate the efficacy of our dataset in training deep learning based Bengali document models.
arXiv Detail & Related papers (2023-03-09T15:15:55Z) - Sentiment analysis in Bengali via transfer learning using multi-lingual
BERT [0.9883261192383611]
In this paper, we present manually tagged 2-class and 3-class SA datasets in Bengali.
We also demonstrate that the multi-lingual BERT model with relevant extensions can be trained via the approach of transfer learning.
This deep learning model achieves an accuracy of 71% for 2-class sentiment classification compared to the current state-of-the-art accuracy of 68%.
arXiv Detail & Related papers (2020-12-03T10:21:11Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Anubhuti -- An annotated dataset for emotional analysis of Bengali short
stories [2.3424047967193826]
Anubhuti is the first and largest text corpus for analyzing emotions expressed by writers of Bengali short stories.
We explain the data collection methods, the manual annotation process and the resulting high inter-annotator agreement.
We have verified the performance of our dataset with baseline Machine Learning and a Deep Learning model for emotion classification.
arXiv Detail & Related papers (2020-10-06T22:33:58Z) - DocBank: A Benchmark Dataset for Document Layout Analysis [114.81155155508083]
We present textbfDocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis.
Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents.
arXiv Detail & Related papers (2020-06-01T16:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.