Ensemble of Anchor-Free Models for Robust Bangla Document Layout
Segmentation
- URL: http://arxiv.org/abs/2308.14397v2
- Date: Tue, 29 Aug 2023 11:46:44 GMT
- Title: Ensemble of Anchor-Free Models for Robust Bangla Document Layout
Segmentation
- Authors: U Mong Sain Chak, Md. Asib Rahman
- Abstract summary: We introduce a novel approach designed for the purpose of segmenting the layout of Bangla documents.
Our methodology involves the utilization of a sophisticated ensemble of YOLOv8 models, which were trained for the DL Sprint 2.0 - BUET CSE Fest 2023 Competition.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this research paper, we introduce a novel approach designed for the
purpose of segmenting the layout of Bangla documents. Our methodology involves
the utilization of a sophisticated ensemble of YOLOv8 models, which were
trained for the DL Sprint 2.0 - BUET CSE Fest 2023 Competition focused on
Bangla document layout segmentation. Our primary emphasis lies in enhancing
various aspects of the task, including techniques such as image augmentation,
model architecture, and the incorporation of model ensembles. We deliberately
reduce the quality of a subset of document images to enhance the resilience of
model training, thereby resulting in an improvement in our cross-validation
score. By employing Bayesian optimization, we determine the optimal confidence
and Intersection over Union (IoU) thresholds for our model ensemble. Through
our approach, we successfully demonstrate the effectiveness of anchor-free
models in achieving robust layout segmentation in Bangla documents.
Related papers
- A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - Explore In-Context Segmentation via Latent Diffusion Models [132.26274147026854]
latent diffusion model (LDM) is an effective minimalist for in-context segmentation.
We build a new and fair in-context segmentation benchmark that includes both image and video datasets.
arXiv Detail & Related papers (2024-03-14T17:52:31Z) - A Lightweight Feature Fusion Architecture For Resource-Constrained Crowd
Counting [3.5066463427087777]
We introduce two lightweight models to enhance the versatility of crowd-counting models.
These models maintain the same downstream architecture while incorporating two distinct backbones: MobileNet and MobileViT.
We leverage Adjacent Feature Fusion to extract diverse scale features from a Pre-Trained Model (PTM) and subsequently combine these features seamlessly.
arXiv Detail & Related papers (2024-01-11T15:13:31Z) - Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach [0.6562256987706128]
We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness.
We fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation.
Our experiments provided key insights to incorporate new strategies into the established solution.
arXiv Detail & Related papers (2023-09-02T07:17:43Z) - Enhancing Visually-Rich Document Understanding via Layout Structure
Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model.
We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z) - Improving Transferability of Adversarial Examples via Bayesian Attacks [84.90830931076901]
We introduce a novel extension by incorporating the Bayesian formulation into the model input as well, enabling the joint diversification of both the model input and model parameters.
Our method achieves a new state-of-the-art on transfer-based attacks, improving the average success rate on ImageNet and CIFAR-10 by 19.14% and 2.08%, respectively.
arXiv Detail & Related papers (2023-07-21T03:43:07Z) - WeLayout: WeChat Layout Analysis System for the ICDAR 2023 Competition
on Robust Layout Segmentation in Corporate Documents [42.1096906112963]
We introduce Weimat, a novel system for segmenting the layout of corporate documents.
Our method significantly surpasses the baseline, securing a top position on the leaderboard with a mAP of 70.0.
arXiv Detail & Related papers (2023-05-11T04:05:30Z) - Multimodal Side-Tuning for Document Classification [3.0229888038442914]
Side-tuning is a methodology for network adaptation recently introduced to solve some of the problems related to previous approaches.
We show that side-tuning can be successfully employed also when different data sources are considered.
arXiv Detail & Related papers (2023-01-16T11:08:03Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.