Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach
- URL: http://arxiv.org/abs/2309.00848v4
- Date: Mon, 16 Sep 2024 19:52:21 GMT
- Title: Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach
- Authors: Nazmus Sakib Ahmed, Saad Sakib Noor, Ashraful Islam Shanto Sikder, Abhijit Paul,
- Abstract summary: We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness.
We fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation.
Our experiments provided key insights to incorporate new strategies into the established solution.
- Score: 0.6562256987706128
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper focuses on enhancing Bengali Document Layout Analysis (DLA) using the YOLOv8 model and innovative post-processing techniques. We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness. After meticulous validation set evaluation, we fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation. Our ensemble model, combined with post-processing, outperforms individual base architectures, addressing issues identified in the BaDLAD dataset. By leveraging this approach, we aim to advance Bengali document analysis, contributing to improved OCR and document comprehension and BaDLAD serves as a foundational resource for this endeavor, aiding future research in the field. Furthermore, our experiments provided key insights to incorporate new strategies into the established solution.
Related papers
- Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models [90.46966584238682]
Most open-source vision-language models only publish their final model weights, leaving critical details of data strategies and implementation largely opaque.
In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs.
By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community.
arXiv Detail & Related papers (2025-01-20T18:40:47Z) - Multi-modal Retrieval Augmented Multi-modal Generation: Datasets, Evaluation Metrics and Strong Baselines [64.61315565501681]
Multi-modal Retrieval Augmented Multi-modal Generation (M$2$RAG) is a novel task that enables foundation models to process multi-modal web content.
Despite its potential impact, M$2$RAG remains understudied, lacking comprehensive analysis and high-quality data resources.
arXiv Detail & Related papers (2024-11-25T13:20:19Z) - Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets [22.29915616018026]
Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks.
Our research aims to evaluate the impact of various configurations of speech encoders, LLMs, and projector modules.
We introduce a three-stage training approach, expressly developed to enhance the model's ability to align auditory and textual information.
arXiv Detail & Related papers (2024-05-03T14:35:58Z) - A Large-Scale Evaluation of Speech Foundation Models [110.95827399522204]
We establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the foundation model paradigm for speech.
We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads.
arXiv Detail & Related papers (2024-04-15T00:03:16Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Ensemble of Anchor-Free Models for Robust Bangla Document Layout
Segmentation [0.0]
We introduce a novel approach designed for the purpose of segmenting the layout of Bangla documents.
Our methodology involves the utilization of a sophisticated ensemble of YOLOv8 models, which were trained for the DL Sprint 2.0 - BUET CSE Fest 2023 Competition.
arXiv Detail & Related papers (2023-08-28T08:24:25Z) - Continual Contrastive Finetuning Improves Low-Resource Relation
Extraction [34.76128090845668]
Relation extraction has been particularly challenging in low-resource scenarios and domains.
Recent literature has tackled low-resource RE by self-supervised learning.
We propose to pretrain and finetune the RE model using consistent objectives of contrastive learning.
arXiv Detail & Related papers (2022-12-21T07:30:22Z) - Improving Meta-learning for Low-resource Text Classification and
Generation via Memory Imitation [87.98063273826702]
We propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation.
A theoretical analysis is provided to prove the effectiveness of our method.
arXiv Detail & Related papers (2022-03-22T12:41:55Z) - Accurate Fine-grained Layout Analysis for the Historical Tibetan
Document Based on the Instance Segmentation [0.9420795715422711]
This paper presents a fine-grained sub-line level layout analysis approach to perform layout analysis on the Kangyur historical Tibetan document.
We introduce an accelerated method to build the dataset which is dynamic and reliable.
Once the network is trained, instances of the text line, sentence, and titles can be segmented and identified.
The experimental results show that the proposed method delivers a decent 72.7% AP on our dataset.
arXiv Detail & Related papers (2021-10-15T15:49:44Z) - BERT based sentiment analysis: A software engineering perspective [0.9176056742068814]
The paper presents three different strategies to analyse BERT based model for sentiment analysis.
The experimental results show that the BERT based ensemble approach and the compressed BERT model attain improvements by 6-12% over prevailing tools for the F1 measure on all three datasets.
arXiv Detail & Related papers (2021-06-04T16:28:26Z) - PoBRL: Optimizing Multi-Document Summarization by Blending Reinforcement
Learning Policies [68.8204255655161]
We propose a reinforcement learning based framework PoBRL for solving multi-document summarization.
Our strategy decouples this multi-objective optimization into different subproblems that can be solved individually by reinforcement learning.
Our empirical analysis shows state-of-the-art performance on several multi-document datasets.
arXiv Detail & Related papers (2021-05-18T02:55:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.