ICDAR 2023 Competition on Robust Layout Segmentation in Corporate
Documents
- URL: http://arxiv.org/abs/2305.14962v1
- Date: Wed, 24 May 2023 09:56:47 GMT
- Title: ICDAR 2023 Competition on Robust Layout Segmentation in Corporate
Documents
- Authors: Christoph Auer, Ahmed Nassar, Maksym Lysak, Michele Dolfi, Nikolaos
Livathinos, Peter Staar
- Abstract summary: ICDAR has a long tradition in hosting competitions to benchmark the state-of-the-art.
To raise the bar over previous competitions, we engineered a hard competition dataset and proposed the recent DocLayNet dataset for training.
We recognize interesting combinations of recent computer vision models, data augmentation strategies and ensemble methods to achieve remarkable accuracy in the task we posed.
- Score: 3.6700088931938835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transforming documents into machine-processable representations is a
challenging task due to their complex structures and variability in formats.
Recovering the layout structure and content from PDF files or scanned material
has remained a key problem for decades. ICDAR has a long tradition in hosting
competitions to benchmark the state-of-the-art and encourage the development of
novel solutions to document layout understanding. In this report, we present
the results of our \textit{ICDAR 2023 Competition on Robust Layout Segmentation
in Corporate Documents}, which posed the challenge to accurately segment the
page layout in a broad range of document styles and domains, including
corporate reports, technical literature and patents. To raise the bar over
previous competitions, we engineered a hard competition dataset and proposed
the recent DocLayNet dataset for training. We recorded 45 team registrations
and received official submissions from 21 teams. In the presented solutions, we
recognize interesting combinations of recent computer vision models, data
augmentation strategies and ensemble methods to achieve remarkable accuracy in
the task we posed. A clear trend towards adoption of vision-transformer based
methods is evident. The results demonstrate substantial progress towards
achieving robust and highly generalizing methods for document layout
understanding.
Related papers
- Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction [23.47150047875133]
Document parsing is essential for converting unstructured and semi-structured documents into machine-readable data.
Document parsing plays an indispensable role in both knowledge base construction and training data generation.
This paper discusses the challenges faced by modular document parsing systems and vision-language models in handling complex layouts.
arXiv Detail & Related papers (2024-10-28T16:11:35Z) - DocSynthv2: A Practical Autoregressive Modeling for Document Generation [43.84027661517748]
This paper proposes a novel approach called Doc Synthv2 through the development of a simple yet effective autoregressive structured model.
Our model, distinct in its integration of both layout and textual cues, marks a step beyond existing layout-generation approaches.
arXiv Detail & Related papers (2024-06-12T16:00:16Z) - EFaR 2023: Efficient Face Recognition Competition [51.77649060180531]
The paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023)
The competition received 17 submissions from 6 different teams.
The submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a diverse set of benchmarks, as well as the deployability given by the number of floating-point operations and model size.
arXiv Detail & Related papers (2023-08-08T09:58:22Z) - ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich
Document Images [198.35937007558078]
The competition opened on 30th December, 2022 and closed on 24th March, 2023.
There are 35 participants and 91 valid submissions received for Track 1, and 15 participants and 26 valid submissions received for Track 2.
According to the performance of the submissions, we believe there is still a large gap on the expected information extraction performance for complex and zero-shot scenarios.
arXiv Detail & Related papers (2023-06-05T22:20:52Z) - ICDAR 2023 Competition on Hierarchical Text Detection and Recognition [60.68100769639923]
The competition is aimed to promote research into deep learning models and systems that can jointly perform text detection and recognition.
We present details of the proposed competition organization, including tasks, datasets, evaluations, and schedule.
During the competition period (from January 2nd 2023 to April 1st 2023), at least 50 submissions from more than 20 teams were made in the 2 proposed tasks.
arXiv Detail & Related papers (2023-05-16T18:56:12Z) - WeLayout: WeChat Layout Analysis System for the ICDAR 2023 Competition
on Robust Layout Segmentation in Corporate Documents [42.1096906112963]
We introduce Weimat, a novel system for segmenting the layout of corporate documents.
Our method significantly surpasses the baseline, securing a top position on the leaderboard with a mAP of 70.0.
arXiv Detail & Related papers (2023-05-11T04:05:30Z) - ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich
Document Understanding [52.3895498789521]
We propose ERNIE, a novel document pre-training solution with layout knowledge enhancement.
We first rearrange input sequences in the serialization stage, then present a correlative pre-training task, reading order prediction, and learn the proper reading order of documents.
Experimental results show ERNIE achieves superior performance on various downstream tasks, setting new state-of-the-art on key information, and document question answering.
arXiv Detail & Related papers (2022-10-12T12:59:24Z) - ICDAR 2021 Competition on Components Segmentation Task of Document
Photos [63.289361617237944]
Three challenge tasks were proposed entailing different segmentation assignments to be performed on a provided dataset.
The collected data are from several types of Brazilian ID documents, whose personal information was conveniently replaced.
Different Deep Learning models were applied by the entrants with diverse strategies to achieve the best results in each of the tasks.
arXiv Detail & Related papers (2021-06-16T00:49:58Z) - DOC2PPT: Automatic Presentation Slides Generation from Scientific
Documents [76.19748112897177]
We present a novel task and approach for document-to-slide generation.
We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner.
Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides.
arXiv Detail & Related papers (2021-01-28T03:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.