Data Efficient Training of a U-Net Based Architecture for Structured
Documents Localization
- URL: http://arxiv.org/abs/2310.00937v1
- Date: Mon, 2 Oct 2023 07:05:19 GMT
- Title: Data Efficient Training of a U-Net Based Architecture for Structured
Documents Localization
- Authors: Anastasiia Kabeshova, Guillaume Betmont, Julien Lerouge, Evgeny
Stepankevich, Alexis Berg\`es
- Abstract summary: We propose SDL-Net: a novel U-Net like encoder-decoder architecture for the localization of structured documents.
Our approach allows pre-training the encoder of SDL-Net on a generic dataset containing samples of various document classes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structured documents analysis and recognition are essential for modern online
on-boarding processes, and document localization is a crucial step to achieve
reliable key information extraction. While deep-learning has become the
standard technique used to solve document analysis problems, real-world
applications in industry still face the limited availability of labelled data
and of computational resources when training or fine-tuning deep-learning
models. To tackle these challenges, we propose SDL-Net: a novel U-Net like
encoder-decoder architecture for the localization of structured documents. Our
approach allows pre-training the encoder of SDL-Net on a generic dataset
containing samples of various document classes, and enables fast and
data-efficient fine-tuning of decoders to support the localization of new
document classes. We conduct extensive experiments on a proprietary dataset of
structured document images to demonstrate the effectiveness and the
generalization capabilities of the proposed approach.
Related papers
- U-DIADS-Bib: a full and few-shot pixel-precise dataset for document
layout analysis of ancient manuscripts [9.76730765089929]
U-DIADS-Bib is a novel, pixel-precise, non-overlapping and noiseless document layout analysis dataset developed in close collaboration between specialists in the fields of computer vision and humanities.
We propose a novel, computer-aided, segmentation pipeline in order to alleviate the burden represented by the time-consuming process of manual annotation.
arXiv Detail & Related papers (2024-01-16T15:11:18Z) - Serving Deep Learning Model in Relational Databases [72.72372281808694]
Serving deep learning (DL) models on relational data has become a critical requirement across diverse commercial and scientific domains.
We highlight three pivotal paradigms: The state-of-the-artDL-Centric architecture offloadsDL computations to dedicated DL frameworks.
The potential UDF-Centric architecture encapsulates one or more tensor computations into User Defined Functions (UDFs) within the database system.
The potentialRelation-Centric architecture aims to represent a large-scale tensor computation through operators.
arXiv Detail & Related papers (2023-10-07T06:01:35Z) - Instruction Tuning for Large Language Models: A Survey [52.86322823501338]
We make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications.
We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
arXiv Detail & Related papers (2023-08-21T15:35:16Z) - DocumentNet: Bridging the Data Gap in Document Pre-Training [78.01647768018485]
We propose a method to collect massive-scale and weakly labeled data from the web to benefit the training of VDER models.
The collected dataset, named DocumentNet, does not depend on specific document types or entity sets.
Experiments on a set of broadly adopted VDER tasks show significant improvements when DocumentNet is incorporated into the pre-training.
arXiv Detail & Related papers (2023-06-15T08:21:15Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - DIVA-DAF: A Deep Learning Framework for Historical Document Image
Analysis [0.6551090704585544]
We propose an open-source deep learning framework, DIVA-DAF, specifically designed for historical document analysis.
It is easy to create one's own tasks with the benefit of powerful modules for loading data, even large data sets.
Thanks to its data module, the framework also allows to reduce the time of model training significantly.
arXiv Detail & Related papers (2022-01-20T17:02:46Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - Deep Learning for Technical Document Classification [6.787004826008753]
This paper describes a novel multimodal deep learning architecture, called TechDoc, for technical document classification.
The trained model can potentially be scaled to millions of real-world technical documents with both text and figures.
arXiv Detail & Related papers (2021-06-27T16:12:47Z) - Evaluation of a Region Proposal Architecture for Multi-task Document
Layout Analysis [0.685316573653194]
Mask-RCNN architecture is designed to address the problem of baseline detection and region segmentation.
We present experimental results on two handwritten text datasets and one handwritten music dataset.
The analyzed architecture yields promising results, outperforming state-of-the-art techniques in all three datasets.
arXiv Detail & Related papers (2021-06-22T14:07:27Z) - Vision-Based Layout Detection from Scientific Literature using Recurrent
Convolutional Neural Networks [12.221478896815292]
We present an approach for adapting convolutional neural networks for object recognition and classification to scientific literature layout detection (SLLD)
SLLD is a shared subtask of several information extraction problems.
Our results show good improvement with fine-tuning of a pre-trained base network.
arXiv Detail & Related papers (2020-10-18T23:50:28Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.