Geometry Aligned Variational Transformer for Image-conditioned Layout
Generation
- URL: http://arxiv.org/abs/2209.00852v1
- Date: Fri, 2 Sep 2022 07:19:12 GMT
- Title: Geometry Aligned Variational Transformer for Image-conditioned Layout
Generation
- Authors: Yunning Cao, Ye Ma, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng Ge,
Yuning Jiang
- Abstract summary: We propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image.
First, self-attention mechanism is adopted to model the contextual relationship within layout elements, while cross-attention mechanism is used to fuse the visual information of conditional images.
We construct a large-scale advertisement poster layout designing dataset with delicate layout and saliency map annotations.
- Score: 38.747175229902396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Layout generation is a novel task in computer vision, which combines the
challenges in both object localization and aesthetic appraisal, widely used in
advertisements, posters, and slides design. An accurate and pleasant layout
should consider both the intra-domain relationship within layout elements and
the inter-domain relationship between layout elements and the image. However,
most previous methods simply focus on image-content-agnostic layout generation,
without leveraging the complex visual information from the image. To this end,
we explore a novel paradigm entitled image-conditioned layout generation, which
aims to add text overlays to an image in a semantically coherent manner.
Specifically, we propose an Image-Conditioned Variational Transformer (ICVT)
that autoregressively generates various layouts in an image. First,
self-attention mechanism is adopted to model the contextual relationship within
layout elements, while cross-attention mechanism is used to fuse the visual
information of conditional images. Subsequently, we take them as building
blocks of conditional variational autoencoder (CVAE), which demonstrates
appealing diversity. Second, in order to alleviate the gap between layout
elements domain and visual domain, we design a Geometry Alignment module, in
which the geometric information of the image is aligned with the layout
representation. In addition, we construct a large-scale advertisement poster
layout designing dataset with delicate layout and saliency map annotations.
Experimental results show that our model can adaptively generate layouts in the
non-intrusive area of the image, resulting in a harmonious layout design.
Related papers
- LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts [107.11267074981905]
We propose a semantically controllable layout-AWare diffusion model, termed LAW-Diffusion.
We show that LAW-Diffusion yields the state-of-the-art generative performance, especially with coherent object relations.
arXiv Detail & Related papers (2023-08-13T08:06:18Z) - PosterLayout: A New Benchmark and Approach for Content-aware
Visual-Textual Presentation Layout [62.12447593298437]
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements.
We propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers.
A novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts.
arXiv Detail & Related papers (2023-03-28T12:48:36Z) - Layout-Bridging Text-to-Image Synthesis [20.261873143881573]
We push for effective modeling in both text-to-image generation and layout-to-image synthesis.
We focus on learning the textual-visual semantic alignment per object in the layout to precisely incorporate the input text into the layout-to-image synthesis process.
arXiv Detail & Related papers (2022-08-12T08:21:42Z) - Composition-aware Graphic Layout GAN for Visual-textual Presentation
Designs [24.29890251913182]
We study the graphic layout generation problem of producing high-quality visual-textual presentation designs for given images.
We propose a deep generative model, dubbed as composition-aware graphic layout GAN (CGL-GAN), to synthesize layouts based on the global and spatial visual contents of input images.
arXiv Detail & Related papers (2022-04-30T16:42:13Z) - Interactive Image Synthesis with Panoptic Layout Generation [14.1026819862002]
We propose Panoptic Layout Generative Adversarial Networks (PLGAN) to address this challenge.
PLGAN employs panoptic theory which distinguishes object categories between "stuff" with amorphous boundaries and "things" with well-defined shapes.
We experimentally compare our PLGAN with state-of-the-art layout-based models on the COCO-Stuff, Visual Genome, and Landscape datasets.
arXiv Detail & Related papers (2022-03-04T02:45:27Z) - Constrained Graphic Layout Generation via Latent Optimization [17.05026043385661]
We generate graphic layouts that can flexibly incorporate design semantics, either specified implicitly or explicitly by a user.
Our approach builds on a generative layout model based on a Transformer architecture, and formulates the layout generation as a constrained optimization problem.
We show in the experiments that our approach is capable of generating realistic layouts in both constrained and unconstrained generation tasks with a single model.
arXiv Detail & Related papers (2021-08-02T13:04:11Z) - Semantic Layout Manipulation with High-Resolution Sparse Attention [106.59650698907953]
We tackle the problem of semantic image layout manipulation, which aims to manipulate an input image by editing its semantic label map.
A core problem of this task is how to transfer visual details from the input images to the new semantic layout while making the resulting image visually realistic.
We propose a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512.
arXiv Detail & Related papers (2020-12-14T06:50:43Z) - Scene Graph to Image Generation with Contextualized Object Layout
Refinement [92.85331019618332]
We propose a novel method to generate images from scene graphs.
Our approach improves the layout coverage by almost 20 points and drops object overlap to negligible amounts.
arXiv Detail & Related papers (2020-09-23T06:27:54Z) - LayoutTransformer: Layout Generation and Completion with Self-attention [105.21138914859804]
We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents, and 3D objects.
We propose LayoutTransformer, a novel framework that leverages self-attention to learn contextual relationships between layout elements.
Our framework allows us to generate a new layout either from an empty set or from an initial seed set of primitives, and can easily scale to support an arbitrary of primitives per layout.
arXiv Detail & Related papers (2020-06-25T17:56:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.