Related papers: LayoutTransformer: Layout Generation and Completion with Self-attention

LayoutTransformer: Layout Generation and Completion with Self-attention

URL: http://arxiv.org/abs/2006.14615v2
Date: Thu, 30 Sep 2021 16:44:42 GMT
Title: LayoutTransformer: Layout Generation and Completion with Self-attention
Authors: Kamal Gupta, Justin Lazarow, Alessandro Achille, Larry Davis, Vijay Mahadevan, Abhinav Shrivastava
Abstract summary: We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents, and 3D objects. We propose LayoutTransformer, a novel framework that leverages self-attention to learn contextual relationships between layout elements. Our framework allows us to generate a new layout either from an empty set or from an initial seed set of primitives, and can easily scale to support an arbitrary of primitives per layout.
Score: 105.21138914859804
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents, and 3D objects. Most complex scenes, natural or human-designed, can be expressed as a meaningful arrangement of simpler compositional graphical primitives. Generating a new layout or extending an existing layout requires understanding the relationships between these primitives. To do this, we propose LayoutTransformer, a novel framework that leverages self-attention to learn contextual relationships between layout elements and generate novel layouts in a given domain. Our framework allows us to generate a new layout either from an empty set or from an initial seed set of primitives, and can easily scale to support an arbitrary of primitives per layout. Furthermore, our analyses show that the model is able to automatically capture the semantic properties of the primitives. We propose simple improvements in both representation of layout primitives, as well as training methods to demonstrate competitive performance in very diverse data domains such as object bounding boxes in natural images(COCO bounding box), documents (PubLayNet), mobile applications (RICO dataset) as well as 3D shapes (Part-Net). Code and other materials will be made available at https://kampta.github.io/layout.

Related papers

SLayR: Scene Layout Generation with Rectified Flow [10.449737374910619]
We introduce SLayR, a novel transformer-based model for text-to-image generation. We show that our method sets a new state of the art for achieving both at the same time, while being at least 3x times smaller in the number of parameters.
arXiv Detail & Related papers (2024-12-06T12:58:58Z)
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation [78.21134311493303]
Diffusion models have been recognized for their ability to generate images that are not only visually appealing but also of high artistic quality. Previous methods primarily focus on UNet-based models (e.g., SD1.5 and SDXL), and limited effort has explored Multimodal Diffusion Transformers (MM-DiTs) Inherit the advantages of MM-DiT, we use a separate set network weights to process the image and text modalities. We contribute a large-scale layout dataset, named LayoutSAM, which includes 2.7 million image-text pairs and 10.7 million entities.
arXiv Detail & Related papers (2024-12-05T04:09:47Z)
A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions [50.469491454128246]
We use text as the guidance to create graphic layouts, i.e., Text-to-labeled, aiming to lower the design barriers. Text-to-labeled is a challenging task, because it needs to consider the implicit, combined, and incomplete constraints from text. We present a two-stage approach, named parse-then-place, to address this problem.
arXiv Detail & Related papers (2023-08-24T10:37:00Z)
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation [121.45667242282721]
We propose a coarse-to-fine paradigm to achieve layout planning and image generation. Our proposed method outperforms the state-of-the-art models in terms of photorealistic layout and image generation.
arXiv Detail & Related papers (2023-08-09T17:45:04Z)
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation [147.81509219686419]
We propose a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number, position, size, and shape. Next, we propose IterInpaint, a new baseline that generates foreground and background regions step-by-step via inpainting. We show comprehensive ablation studies on IterInpaint, including training task ratio, crop&paste vs. repaint, and generation order.
arXiv Detail & Related papers (2023-04-13T16:58:33Z)
PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout [62.12447593298437]
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements. We propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers. A novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts.
arXiv Detail & Related papers (2023-03-28T12:48:36Z)
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z)
Geometry Aligned Variational Transformer for Image-conditioned Layout Generation [38.747175229902396]
We propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image. First, self-attention mechanism is adopted to model the contextual relationship within layout elements, while cross-attention mechanism is used to fuse the visual information of conditional images. We construct a large-scale advertisement poster layout designing dataset with delicate layout and saliency map annotations.
arXiv Detail & Related papers (2022-09-02T07:19:12Z)
Constrained Graphic Layout Generation via Latent Optimization [17.05026043385661]
We generate graphic layouts that can flexibly incorporate design semantics, either specified implicitly or explicitly by a user. Our approach builds on a generative layout model based on a Transformer architecture, and formulates the layout generation as a constrained optimization problem. We show in the experiments that our approach is capable of generating realistic layouts in both constrained and unconstrained generation tasks with a single model.
arXiv Detail & Related papers (2021-08-02T13:04:11Z)
Generative Layout Modeling using Constraint Graphs [37.78500605563527]
We propose a new generative model for layout generation. First, we generate the layout elements as nodes in a layout graph. Second, we compute constraints between layout elements as edges in the layout graph. Third, we solve for the final layout using constrained optimization.
arXiv Detail & Related papers (2020-11-26T18:18:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.