Unveiling Document Structures with YOLOv5 Layout Detection
- URL: http://arxiv.org/abs/2309.17033v1
- Date: Fri, 29 Sep 2023 07:45:10 GMT
- Title: Unveiling Document Structures with YOLOv5 Layout Detection
- Authors: Herman Sugiharto, Yorissa Silviana, Yani Siti Nurpazrin
- Abstract summary: This research investigates the utilization of YOLOv5, a cutting-edge computer vision model, for the purpose of rapidly identifying document layouts and extracting unstructured data.
The main objective is to create an autonomous system that can effectively recognize document layouts and extract unstructured data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The current digital environment is characterized by the widespread presence
of data, particularly unstructured data, which poses many issues in sectors
including finance, healthcare, and education. Conventional techniques for data
extraction encounter difficulties in dealing with the inherent variety and
complexity of unstructured data, hence requiring the adoption of more efficient
methodologies. This research investigates the utilization of YOLOv5, a
cutting-edge computer vision model, for the purpose of rapidly identifying
document layouts and extracting unstructured data.
The present study establishes a conceptual framework for delineating the
notion of "objects" as they pertain to documents, incorporating various
elements such as paragraphs, tables, photos, and other constituent parts. The
main objective is to create an autonomous system that can effectively recognize
document layouts and extract unstructured data, hence improving the
effectiveness of data extraction.
In the conducted examination, the YOLOv5 model exhibits notable effectiveness
in the task of document layout identification, attaining a high accuracy rate
along with a precision value of 0.91, a recall value of 0.971, an F1-score of
0.939, and an area under the receiver operating characteristic curve (AUC-ROC)
of 0.975. The remarkable performance of this system optimizes the process of
extracting textual and tabular data from document images. Its prospective
applications are not limited to document analysis but can encompass
unstructured data from diverse sources, such as audio data.
This study lays the foundation for future investigations into the wider
applicability of YOLOv5 in managing various types of unstructured data,
offering potential for novel applications across multiple domains.
Related papers
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models [63.466265039007816]
We present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community.
We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark.
arXiv Detail & Related papers (2024-06-17T15:13:52Z) - On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z) - Enhancing Document Information Analysis with Multi-Task Pre-training: A
Robust Approach for Information Extraction in Visually-Rich Documents [8.49076413640561]
The model is pre-trained and subsequently fine-tuned for various document image analysis tasks.
The proposed model achieved impressive results across all tasks, with an accuracy of 95.87% on the RVL-CDIP dataset for document classification.
arXiv Detail & Related papers (2023-10-25T10:22:30Z) - Information Extraction in Domain and Generic Documents: Findings from
Heuristic-based and Data-driven Approaches [0.0]
Information extraction plays important role in natural language processing.
Document genre and length influence on IE tasks.
No single method demonstrated overwhelming performance in both tasks.
arXiv Detail & Related papers (2023-06-30T20:43:27Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Visual Information Extraction in the Wild: Practical Dataset and
End-to-end Solution [48.693941280097974]
We propose a large-scale dataset consisting of camera images for visual information extraction (VIE)
We propose a novel framework for end-to-end VIE that combines the stages of OCR and information extraction in an end-to-end learning fashion.
We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE to our proposed dataset due to the larger variance of layout and entities.
arXiv Detail & Related papers (2023-05-12T14:11:47Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - Extracting Variable-Depth Logical Document Hierarchy from Long
Documents: Method, Evaluation, and Application [21.270184491603864]
We develop a framework, namely Hierarchy Extraction from Long Document (HELD), where we "sequentially" insert each physical object at the proper on of the current tree.
Experiments based on thousands of long documents from Chinese, English financial market and English scientific publication.
We show that logical document hierarchy can be employed to significantly improve the performance of the downstream passage retrieval task.
arXiv Detail & Related papers (2021-05-14T06:26:22Z) - Learning from similarity and information extraction from structured
documents [0.0]
The aim is to improve micro F1 of per-word classification on a huge real-world document dataset.
Results confirm that all proposed architecture parts are all required to beat the previous results.
The best model improves the previous state-of-the-art results by an 8.25 gain in F1 score.
arXiv Detail & Related papers (2020-10-17T21:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.