Graph-based Deep Generative Modelling for Document Layout Generation
- URL: http://arxiv.org/abs/2107.04357v1
- Date: Fri, 9 Jul 2021 10:49:49 GMT
- Title: Graph-based Deep Generative Modelling for Document Layout Generation
- Authors: Sanket Biswas, Pau Riba, Josep Llad\'os, and Umapada Pal
- Abstract summary: We have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts.
It is also the first graph-based approach for document layout generation task experimented on administrative document images.
- Score: 14.907063348987075
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: One of the major prerequisites for any deep learning approach is the
availability of large-scale training data. When dealing with scanned document
images in real world scenarios, the principal information of its content is
stored in the layout itself. In this work, we have proposed an automated deep
generative model using Graph Neural Networks (GNNs) to generate synthetic data
with highly variable and plausible document layouts that can be used to train
document interpretation systems, in this case, specially in digital mailroom
applications. It is also the first graph-based approach for document layout
generation task experimented on administrative document images, in this case,
invoices.
Related papers
- VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents [66.42579289213941]
Retrieval-augmented generation (RAG) is an effective technique that enables large language models to utilize external knowledge sources for generation.
In this paper, we introduce VisRAG, which tackles this issue by establishing a vision-language model (VLM)-based RAG pipeline.
In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.
arXiv Detail & Related papers (2024-10-14T15:04:18Z) - Visually Guided Generative Text-Layout Pre-training for Document Intelligence [51.09853181377696]
We propose visually guided generative text-pre-training, named ViTLP.
Given a document image, the model optimize hierarchical language and layout modeling objectives to generate the interleaved text and layout sequence.
ViTLP can function as a native OCR model to localize and recognize texts of document images.
arXiv Detail & Related papers (2024-03-25T08:00:43Z) - Enhancing Visually-Rich Document Understanding via Layout Structure
Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model.
We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z) - Unifying Vision, Text, and Layout for Universal Document Processing [105.36490575974028]
We propose a Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation.
Our method sets the state-of-the-art on 9 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites.
arXiv Detail & Related papers (2022-12-05T22:14:49Z) - ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich
Document Understanding [52.3895498789521]
We propose ERNIE, a novel document pre-training solution with layout knowledge enhancement.
We first rearrange input sequences in the serialization stage, then present a correlative pre-training task, reading order prediction, and learn the proper reading order of documents.
Experimental results show ERNIE achieves superior performance on various downstream tasks, setting new state-of-the-art on key information, and document question answering.
arXiv Detail & Related papers (2022-10-12T12:59:24Z) - Doc2Graph: a Task Agnostic Document Understanding Framework based on
Graph Neural Networks [0.965964228590342]
We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model.
We evaluate our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection.
arXiv Detail & Related papers (2022-08-23T19:48:10Z) - Synthetic Document Generator for Annotation-free Layout Recognition [15.657295650492948]
We describe a synthetic document generator that automatically produces realistic documents with labels for spatial positions, extents and categories of layout elements.
We empirically illustrate that a deep layout detection model trained purely on the synthetic documents can match the performance of a model that uses real documents.
arXiv Detail & Related papers (2021-11-11T01:58:44Z) - DocSynth: A Layout Guided Approach for Controllable Document Image
Synthesis [16.284895792639137]
This paper presents a novel approach, called Doc Synth, to automatically synthesize document images based on a given layout.
In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed Doc Synth model learns to generate a set of realistic document images.
The results highlight that our model can successfully generate realistic and diverse document images with multiple objects.
arXiv Detail & Related papers (2021-07-06T14:24:30Z) - End-to-End Information Extraction by Character-Level Embedding and
Multi-Stage Attentional U-Net [0.9137554315375922]
We propose a novel deep learning architecture for end-to-end information extraction on the 2D character-grid embedding of the document.
We show that our model outperforms the baseline U-Net architecture by a large margin while using 40% fewer parameters.
arXiv Detail & Related papers (2021-06-02T05:42:51Z) - DOC2PPT: Automatic Presentation Slides Generation from Scientific
Documents [76.19748112897177]
We present a novel task and approach for document-to-slide generation.
We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner.
Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides.
arXiv Detail & Related papers (2021-01-28T03:21:17Z) - Multiple Document Datasets Pre-training Improves Text Line Detection
With Deep Neural Networks [2.5352713493505785]
We introduce a fully convolutional network for the document layout analysis task.
Our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents.
We show that Doc-UFCN outperforms state-of-the-art methods on various datasets.
arXiv Detail & Related papers (2020-12-28T09:48:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.