Data Efficient Training of a U-Net Based Architecture for Structured
Documents Localization
- URL: http://arxiv.org/abs/2310.00937v1
- Date: Mon, 2 Oct 2023 07:05:19 GMT
- Title: Data Efficient Training of a U-Net Based Architecture for Structured
Documents Localization
- Authors: Anastasiia Kabeshova, Guillaume Betmont, Julien Lerouge, Evgeny
Stepankevich, Alexis Berg\`es
- Abstract summary: We propose SDL-Net: a novel U-Net like encoder-decoder architecture for the localization of structured documents.
Our approach allows pre-training the encoder of SDL-Net on a generic dataset containing samples of various document classes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structured documents analysis and recognition are essential for modern online
on-boarding processes, and document localization is a crucial step to achieve
reliable key information extraction. While deep-learning has become the
standard technique used to solve document analysis problems, real-world
applications in industry still face the limited availability of labelled data
and of computational resources when training or fine-tuning deep-learning
models. To tackle these challenges, we propose SDL-Net: a novel U-Net like
encoder-decoder architecture for the localization of structured documents. Our
approach allows pre-training the encoder of SDL-Net on a generic dataset
containing samples of various document classes, and enables fast and
data-efficient fine-tuning of decoders to support the localization of new
document classes. We conduct extensive experiments on a proprietary dataset of
structured document images to demonstrate the effectiveness and the
generalization capabilities of the proposed approach.
Related papers
- Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian [75.94354349994576]
This paper explores the feasibility of employing smaller, domain-specific encoder LMs alongside prompting techniques to enhance performance in specialized contexts.
Our study concentrates on the Italian bureaucratic and legal language, experimenting with both general-purpose and further pre-trained encoder-only models.
The results indicate that while further pre-trained models may show diminished robustness in general knowledge, they exhibit superior adaptability for domain-specific tasks, even in a zero-shot setting.
arXiv Detail & Related papers (2024-07-30T08:50:16Z) - U-DIADS-Bib: a full and few-shot pixel-precise dataset for document
layout analysis of ancient manuscripts [9.76730765089929]
U-DIADS-Bib is a novel, pixel-precise, non-overlapping and noiseless document layout analysis dataset developed in close collaboration between specialists in the fields of computer vision and humanities.
We propose a novel, computer-aided, segmentation pipeline in order to alleviate the burden represented by the time-consuming process of manual annotation.
arXiv Detail & Related papers (2024-01-16T15:11:18Z) - GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification [8.880856137902947]
We introduce GlobalDoc, a cross-modal transformer-based architecture pre-trained in a self-supervised manner.
GlobalDoc improves the learning of richer semantic concepts by unifying language and visual representations.
For proper evaluation, we also propose two novel document-level downstream VDU tasks, Few-Shot Document Image Classification (DIC) and Content-based Document Image Retrieval (DIR)
arXiv Detail & Related papers (2023-09-11T18:35:14Z) - Instruction Tuning for Large Language Models: A Survey [52.86322823501338]
We make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications.
We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
arXiv Detail & Related papers (2023-08-21T15:35:16Z) - DocumentNet: Bridging the Data Gap in Document Pre-Training [78.01647768018485]
We propose a method to collect massive-scale and weakly labeled data from the web to benefit the training of VDER models.
The collected dataset, named DocumentNet, does not depend on specific document types or entity sets.
Experiments on a set of broadly adopted VDER tasks show significant improvements when DocumentNet is incorporated into the pre-training.
arXiv Detail & Related papers (2023-06-15T08:21:15Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - DIVA-DAF: A Deep Learning Framework for Historical Document Image
Analysis [0.6551090704585544]
We propose an open-source deep learning framework, DIVA-DAF, specifically designed for historical document analysis.
It is easy to create one's own tasks with the benefit of powerful modules for loading data, even large data sets.
Thanks to its data module, the framework also allows to reduce the time of model training significantly.
arXiv Detail & Related papers (2022-01-20T17:02:46Z) - Deep Learning for Technical Document Classification [6.787004826008753]
This paper describes a novel multimodal deep learning architecture, called TechDoc, for technical document classification.
The trained model can potentially be scaled to millions of real-world technical documents with both text and figures.
arXiv Detail & Related papers (2021-06-27T16:12:47Z) - Evaluation of a Region Proposal Architecture for Multi-task Document
Layout Analysis [0.685316573653194]
Mask-RCNN architecture is designed to address the problem of baseline detection and region segmentation.
We present experimental results on two handwritten text datasets and one handwritten music dataset.
The analyzed architecture yields promising results, outperforming state-of-the-art techniques in all three datasets.
arXiv Detail & Related papers (2021-06-22T14:07:27Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.