WeLayout: WeChat Layout Analysis System for the ICDAR 2023 Competition
on Robust Layout Segmentation in Corporate Documents
- URL: http://arxiv.org/abs/2305.06553v1
- Date: Thu, 11 May 2023 04:05:30 GMT
- Title: WeLayout: WeChat Layout Analysis System for the ICDAR 2023 Competition
on Robust Layout Segmentation in Corporate Documents
- Authors: Mingliang Zhang, Zhen Cao, Juntao Liu, Liqiang Niu, Fandong Meng, Jie
Zhou
- Abstract summary: We introduce Weimat, a novel system for segmenting the layout of corporate documents.
Our method significantly surpasses the baseline, securing a top position on the leaderboard with a mAP of 70.0.
- Score: 42.1096906112963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce WeLayout, a novel system for segmenting the
layout of corporate documents, which stands for WeChat Layout Analysis System.
Our approach utilizes a sophisticated ensemble of DINO and YOLO models,
specifically developed for the ICDAR 2023 Competition on Robust Layout
Segmentation. Our method significantly surpasses the baseline, securing a top
position on the leaderboard with a mAP of 70.0. To achieve this performance, we
concentrated on enhancing various aspects of the task, such as dataset
augmentation, model architecture, bounding box refinement, and model ensemble
techniques. Additionally, we trained the data separately for each document
category to ensure a higher mean submission score. We also developed an
algorithm for cell matching to further improve our performance. To identify the
optimal weights and IoU thresholds for our model ensemble, we employed a
Bayesian optimization algorithm called the Tree-Structured Parzen Estimator.
Our approach effectively demonstrates the benefits of combining query-based and
anchor-free models for achieving robust layout segmentation in corporate
documents.
Related papers
- Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training [73.90260246781435]
We present Lory, the first approach that scales such architectures to autoregressive language model pre-training.
We show significant performance gains over parameter-matched dense models on both perplexity and a variety of downstream tasks.
Despite segment-level routing, Lory models achieve competitive performance compared to state-of-the-art MoE models with token-level routing.
arXiv Detail & Related papers (2024-05-06T03:06:33Z) - Interfacing Foundation Models' Embeddings [131.0352288172788]
We present FIND, a generalized interface for aligning foundation models' embeddings with unified image and dataset-level understanding spanning modality and granularity.
In light of the interleaved embedding space, we introduce FIND-Bench, which introduces new training and evaluation annotations to the COCO dataset for interleaved segmentation and retrieval.
arXiv Detail & Related papers (2023-12-12T18:58:02Z) - Hierarchical Audio-Visual Information Fusion with Multi-label Joint
Decoding for MER 2023 [51.95161901441527]
In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions.
Deep features extracted from foundation models are used as robust acoustic and visual representations of raw video.
Our final system achieves state-of-the-art performance and ranks third on the leaderboard on MER-MULTI sub-challenge.
arXiv Detail & Related papers (2023-09-11T03:19:10Z) - Ensemble of Anchor-Free Models for Robust Bangla Document Layout
Segmentation [0.0]
We introduce a novel approach designed for the purpose of segmenting the layout of Bangla documents.
Our methodology involves the utilization of a sophisticated ensemble of YOLOv8 models, which were trained for the DL Sprint 2.0 - BUET CSE Fest 2023 Competition.
arXiv Detail & Related papers (2023-08-28T08:24:25Z) - Enhancing Visually-Rich Document Understanding via Layout Structure
Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model.
We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z) - ICDAR 2023 Competition on Robust Layout Segmentation in Corporate
Documents [3.6700088931938835]
ICDAR has a long tradition in hosting competitions to benchmark the state-of-the-art.
To raise the bar over previous competitions, we engineered a hard competition dataset and proposed the recent DocLayNet dataset for training.
We recognize interesting combinations of recent computer vision models, data augmentation strategies and ensemble methods to achieve remarkable accuracy in the task we posed.
arXiv Detail & Related papers (2023-05-24T09:56:47Z) - Streamlined Framework for Agile Forecasting Model Development towards
Efficient Inventory Management [2.0625936401496237]
This paper proposes a framework for developing forecasting models by streamlining the connections between core components of the developmental process.
The proposed framework enables swift and robust integration of new datasets, experimentation on different algorithms, and selection of the best models.
arXiv Detail & Related papers (2023-04-13T08:52:32Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Learning-To-Ensemble by Contextual Rank Aggregation in E-Commerce [8.067201256886733]
We propose a new Learning-To-Ensemble framework RAEGO, which replaces the ensemble model with a contextual Rank Aggregator.
RA-EGO has been deployed in our online system and has improved the revenue significantly.
arXiv Detail & Related papers (2021-07-19T03:24:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.