CanvasVAE: Learning to Generate Vector Graphic Documents
- URL: http://arxiv.org/abs/2108.01249v1
- Date: Tue, 3 Aug 2021 02:14:25 GMT
- Title: CanvasVAE: Learning to Generate Vector Graphic Documents
- Authors: Kota Yamaguchi
- Abstract summary: We learn a generative model of vector graphic documents using a dataset of design templates from an online service.
In experiments, we show that our model, named CanvasVAE, constitutes a strong baseline for generative modeling of vector graphic documents.
- Score: 1.8478165393315746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vector graphic documents present visual elements in a resolution free,
compact format and are often seen in creative applications. In this work, we
attempt to learn a generative model of vector graphic documents. We define
vector graphic documents by a multi-modal set of attributes associated to a
canvas and a sequence of visual elements such as shapes, images, or texts, and
train variational auto-encoders to learn the representation of the documents.
We collect a new dataset of design templates from an online service that
features complete document structure including occluded elements. In
experiments, we show that our model, named CanvasVAE, constitutes a strong
baseline for generative modeling of vector graphic documents.
Related papers
- Visually Guided Generative Text-Layout Pre-training for Document Intelligence [51.09853181377696]
We propose visually guided generative text-pre-training, named ViTLP.
Given a document image, the model optimize hierarchical language and layout modeling objectives to generate the interleaved text and layout sequence.
ViTLP can function as a native OCR model to localize and recognize texts of document images.
arXiv Detail & Related papers (2024-03-25T08:00:43Z) - Enhancing Visually-Rich Document Understanding via Layout Structure
Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model.
We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z) - Unsupervised Compositional Concepts Discovery with Text-to-Image
Generative Models [80.75258849913574]
In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image?
We present an unsupervised approach to discover generative concepts from a collection of images, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images.
arXiv Detail & Related papers (2023-06-08T17:02:15Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z) - Unifying Vision, Text, and Layout for Universal Document Processing [105.36490575974028]
We propose a Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation.
Our method sets the state-of-the-art on 9 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites.
arXiv Detail & Related papers (2022-12-05T22:14:49Z) - Composition-aware Graphic Layout GAN for Visual-textual Presentation
Designs [24.29890251913182]
We study the graphic layout generation problem of producing high-quality visual-textual presentation designs for given images.
We propose a deep generative model, dubbed as composition-aware graphic layout GAN (CGL-GAN), to synthesize layouts based on the global and spatial visual contents of input images.
arXiv Detail & Related papers (2022-04-30T16:42:13Z) - Synthetic Document Generator for Annotation-free Layout Recognition [15.657295650492948]
We describe a synthetic document generator that automatically produces realistic documents with labels for spatial positions, extents and categories of layout elements.
We empirically illustrate that a deep layout detection model trained purely on the synthetic documents can match the performance of a model that uses real documents.
arXiv Detail & Related papers (2021-11-11T01:58:44Z) - Contrastive Document Representation Learning with Graph Attention
Networks [18.22722084624321]
We propose to use a graph attention network on top of the available pretrained Transformers model to learn document embeddings.
In addition, based on our graph document model, we design a simple contrastive learning strategy to pretrain our models on a large amount of unlabeled corpus.
arXiv Detail & Related papers (2021-10-20T21:05:02Z) - SketchEmbedNet: Learning Novel Concepts by Imitating Drawings [125.45799722437478]
We explore properties of image representations learned by training a model to produce sketches of images.
We show that this generative, class-agnostic model produces informative embeddings of images from novel examples, classes, and even novel datasets in a few-shot setting.
arXiv Detail & Related papers (2020-08-27T16:43:28Z) - Graphical Object Detection in Document Images [30.48863304419383]
We present a novel end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD)
Our framework is data-driven and does not require any meta-data to locate graphical objects in the document images.
Our model yields promising results as compared to state-of-the-art techniques.
arXiv Detail & Related papers (2020-08-25T06:35:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.