Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach
- URL: http://arxiv.org/abs/2309.00848v4
- Date: Mon, 16 Sep 2024 19:52:21 GMT
- Title: Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach
- Authors: Nazmus Sakib Ahmed, Saad Sakib Noor, Ashraful Islam Shanto Sikder, Abhijit Paul,
- Abstract summary: We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness.
We fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation.
Our experiments provided key insights to incorporate new strategies into the established solution.
- Score: 0.6562256987706128
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper focuses on enhancing Bengali Document Layout Analysis (DLA) using the YOLOv8 model and innovative post-processing techniques. We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness. After meticulous validation set evaluation, we fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation. Our ensemble model, combined with post-processing, outperforms individual base architectures, addressing issues identified in the BaDLAD dataset. By leveraging this approach, we aim to advance Bengali document analysis, contributing to improved OCR and document comprehension and BaDLAD serves as a foundational resource for this endeavor, aiding future research in the field. Furthermore, our experiments provided key insights to incorporate new strategies into the established solution.
Related papers
- Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets [22.29915616018026]
Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks.
Our research aims to evaluate the impact of various configurations of speech encoders, LLMs, and projector modules.
We introduce a three-stage training approach, expressly developed to enhance the model's ability to align auditory and textual information.
arXiv Detail & Related papers (2024-05-03T14:35:58Z) - A Large-Scale Evaluation of Speech Foundation Models [110.95827399522204]
We establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the foundation model paradigm for speech.
We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads.
arXiv Detail & Related papers (2024-04-15T00:03:16Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Ensemble of Anchor-Free Models for Robust Bangla Document Layout
Segmentation [0.0]
We introduce a novel approach designed for the purpose of segmenting the layout of Bangla documents.
Our methodology involves the utilization of a sophisticated ensemble of YOLOv8 models, which were trained for the DL Sprint 2.0 - BUET CSE Fest 2023 Competition.
arXiv Detail & Related papers (2023-08-28T08:24:25Z) - Instruction Tuning for Large Language Models: A Survey [52.86322823501338]
We make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications.
We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
arXiv Detail & Related papers (2023-08-21T15:35:16Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Continual Contrastive Finetuning Improves Low-Resource Relation
Extraction [34.76128090845668]
Relation extraction has been particularly challenging in low-resource scenarios and domains.
Recent literature has tackled low-resource RE by self-supervised learning.
We propose to pretrain and finetune the RE model using consistent objectives of contrastive learning.
arXiv Detail & Related papers (2022-12-21T07:30:22Z) - Improving Meta-learning for Low-resource Text Classification and
Generation via Memory Imitation [87.98063273826702]
We propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation.
A theoretical analysis is provided to prove the effectiveness of our method.
arXiv Detail & Related papers (2022-03-22T12:41:55Z) - Accurate Fine-grained Layout Analysis for the Historical Tibetan
Document Based on the Instance Segmentation [0.9420795715422711]
This paper presents a fine-grained sub-line level layout analysis approach to perform layout analysis on the Kangyur historical Tibetan document.
We introduce an accelerated method to build the dataset which is dynamic and reliable.
Once the network is trained, instances of the text line, sentence, and titles can be segmented and identified.
The experimental results show that the proposed method delivers a decent 72.7% AP on our dataset.
arXiv Detail & Related papers (2021-10-15T15:49:44Z) - BERT based sentiment analysis: A software engineering perspective [0.9176056742068814]
The paper presents three different strategies to analyse BERT based model for sentiment analysis.
The experimental results show that the BERT based ensemble approach and the compressed BERT model attain improvements by 6-12% over prevailing tools for the F1 measure on all three datasets.
arXiv Detail & Related papers (2021-06-04T16:28:26Z) - PoBRL: Optimizing Multi-Document Summarization by Blending Reinforcement
Learning Policies [68.8204255655161]
We propose a reinforcement learning based framework PoBRL for solving multi-document summarization.
Our strategy decouples this multi-objective optimization into different subproblems that can be solved individually by reinforcement learning.
Our empirical analysis shows state-of-the-art performance on several multi-document datasets.
arXiv Detail & Related papers (2021-05-18T02:55:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.