UnSupDLA: Towards Unsupervised Document Layout Analysis
- URL: http://arxiv.org/abs/2406.06236v1
- Date: Mon, 10 Jun 2024 13:06:28 GMT
- Title: UnSupDLA: Towards Unsupervised Document Layout Analysis
- Authors: Talha Uddin Sheikh, Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Muhammad Zeshan Afzal,
- Abstract summary: A critical but frequently overlooked problem is the scarcity of labeled data needed for layout analysis.
We employ a vision-based approach for analyzing document layouts designed to train a network without labels.
- Score: 11.574592219976823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Document layout analysis is a key area in document research, involving techniques like text mining and visual analysis. Despite various methods developed to tackle layout analysis, a critical but frequently overlooked problem is the scarcity of labeled data needed for analyses. With the rise of internet use, an overwhelming number of documents are now available online, making the process of accurately labeling them for research purposes increasingly challenging and labor-intensive. Moreover, the diversity of documents online presents a unique set of challenges in maintaining the quality and consistency of these labels, further complicating document layout analysis in the digital era. To address this, we employ a vision-based approach for analyzing document layouts designed to train a network without labels. Instead, we focus on pre-training, initially generating simple object masks from the unlabeled document images. These masks are then used to train a detector, enhancing object detection and segmentation performance. The model's effectiveness is further amplified through several unsupervised training iterations, continuously refining its performance. This approach significantly advances document layout analysis, particularly precision and efficiency, without labels.
Related papers
- U-DIADS-Bib: a full and few-shot pixel-precise dataset for document
layout analysis of ancient manuscripts [9.76730765089929]
U-DIADS-Bib is a novel, pixel-precise, non-overlapping and noiseless document layout analysis dataset developed in close collaboration between specialists in the fields of computer vision and humanities.
We propose a novel, computer-aided, segmentation pipeline in order to alleviate the burden represented by the time-consuming process of manual annotation.
arXiv Detail & Related papers (2024-01-16T15:11:18Z) - On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z) - Document AI: A Comparative Study of Transformer-Based, Graph-Based
Models, and Convolutional Neural Networks For Document Layout Analysis [3.231170156689185]
Document AI aims to automatically analyze documents by leveraging natural language processing and computer vision techniques.
One of the major tasks of Document AI is document layout analysis, which structures document pages by interpreting the content and spatial relationships of layout, image, and text.
arXiv Detail & Related papers (2023-08-29T16:58:03Z) - SelfDocSeg: A Self-Supervised vision-based Approach towards Document
Segmentation [15.953725529361874]
Document layout analysis is a known problem to the documents research community.
With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain.
We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches.
arXiv Detail & Related papers (2023-05-01T12:47:55Z) - Efficient few-shot learning for pixel-precise handwritten document
layout analysis [11.453393410516991]
We propose an efficient few-shot learning framework for layout analysis.
It achieves performances comparable to current state-of-the-art fully supervised methods on the publicly available DIVA-HisDB dataset.
arXiv Detail & Related papers (2022-10-27T16:03:52Z) - Metrics reloaded: Recommendations for image analysis validation [59.60445111432934]
Metrics Reloaded is a comprehensive framework guiding researchers in the problem-aware selection of metrics.
The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint.
Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics.
arXiv Detail & Related papers (2022-06-03T15:56:51Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - Synthetic Document Generator for Annotation-free Layout Recognition [15.657295650492948]
We describe a synthetic document generator that automatically produces realistic documents with labels for spatial positions, extents and categories of layout elements.
We empirically illustrate that a deep layout detection model trained purely on the synthetic documents can match the performance of a model that uses real documents.
arXiv Detail & Related papers (2021-11-11T01:58:44Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - Active Learning from Crowd in Document Screening [76.9545252341746]
We focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently.
We propose a multi-label active learning screening specific sampling technique -- objective-aware sampling.
We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.
arXiv Detail & Related papers (2020-11-11T16:17:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.