Variational Transformer Networks for Layout Generation
- URL: http://arxiv.org/abs/2104.02416v1
- Date: Tue, 6 Apr 2021 10:45:53 GMT
- Title: Variational Transformer Networks for Layout Generation
- Authors: Diego Martin Arroyo, Janis Postels and Federico Tombari
- Abstract summary: We exploit the properties of self-attention layers to capture relationships between elements in a layout.
Our proposed Variational Transformer Network (VTN) is capable of learning margins, alignments and other global design rules without explicit supervision.
- Score: 39.25496294840713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models able to synthesize layouts of different kinds (e.g.
documents, user interfaces or furniture arrangements) are a useful tool to aid
design processes and as a first step in the generation of synthetic data, among
other tasks. We exploit the properties of self-attention layers to capture high
level relationships between elements in a layout, and use these as the building
blocks of the well-known Variational Autoencoder (VAE) formulation. Our
proposed Variational Transformer Network (VTN) is capable of learning margins,
alignments and other global design rules without explicit supervision. Layouts
sampled from our model have a high degree of resemblance to the training data,
while demonstrating appealing diversity. In an extensive evaluation on publicly
available benchmarks for different layout types VTNs achieve state-of-the-art
diversity and perceptual quality. Additionally, we show the capabilities of
this method as part of a document layout detection pipeline.
Related papers
- PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - Multi-Modal Knowledge Graph Transformer Framework for Multi-Modal Entity
Alignment [17.592908862768425]
We propose a novel MMEA transformer, called MoAlign, that hierarchically introduces neighbor features, multi-modal attributes, and entity types.
Taking advantage of the transformer's ability to better integrate multiple information, we design a hierarchical modifiable self-attention block in a transformer encoder.
Our approach outperforms strong competitors and achieves excellent entity alignment performance.
arXiv Detail & Related papers (2023-10-10T07:06:06Z) - LayoutDM: Transformer-based Diffusion Model for Layout Generation [0.6445605125467572]
Transformer-based diffusion model (DDPM) is proposed to generate high-quality images.
Transformer-based conditional Layout Denoiser is proposed to generate samples from noised layout data.
Our method outperforms state-of-the-art generative models in terms of quality and diversity.
arXiv Detail & Related papers (2023-05-04T05:51:35Z) - Unifying Layout Generation with a Decoupled Diffusion Model [26.659337441975143]
It is a crucial task for reducing the burden on heavy-duty graphic design works for formatted scenes, e.g., publications, documents, and user interfaces (UIs)
We propose a layout Diffusion Generative Model (LDGM) to achieve such unification with a single decoupled diffusion model.
Our proposed LDGM can generate layouts either from scratch or conditional on arbitrary available attributes.
arXiv Detail & Related papers (2023-03-09T05:53:32Z) - Demystify Transformers & Convolutions in Modern Image Deep Networks [82.32018252867277]
This paper aims to identify the real gains of popular convolution and attention operators through a detailed study.
We find that the key difference among these feature transformation modules, such as attention or convolution, lies in their spatial feature aggregation approach.
Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs.
arXiv Detail & Related papers (2022-11-10T18:59:43Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias [76.16156833138038]
We propose a novel Vision Transformer Advanced by Exploring intrinsic IB from convolutions, ie, ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
In each transformer layer, ViTAE has a convolution block in parallel to the multi-head self-attention module, whose features are fused and fed into the feed-forward network.
arXiv Detail & Related papers (2021-06-07T05:31:06Z) - Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA)
We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly.
Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z) - Point Cloud Learning with Transformer [2.3204178451683264]
We introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT)
Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales.
A multi-level transformer module is designed to aggregate contextual information from different levels of each scale and enhance their interactions.
arXiv Detail & Related papers (2021-04-28T08:39:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.