Composition-aware Graphic Layout GAN for Visual-textual Presentation
Designs
- URL: http://arxiv.org/abs/2205.00303v1
- Date: Sat, 30 Apr 2022 16:42:13 GMT
- Title: Composition-aware Graphic Layout GAN for Visual-textual Presentation
Designs
- Authors: Min Zhou, Chenchen Xu, Ye Ma, Tiezheng Ge, Yuning Jiang and Weiwei Xu
- Abstract summary: We study the graphic layout generation problem of producing high-quality visual-textual presentation designs for given images.
We propose a deep generative model, dubbed as composition-aware graphic layout GAN (CGL-GAN), to synthesize layouts based on the global and spatial visual contents of input images.
- Score: 24.29890251913182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the graphic layout generation problem of producing
high-quality visual-textual presentation designs for given images. We note that
image compositions, which contain not only global semantics but also spatial
information, would largely affect layout results. Hence, we propose a deep
generative model, dubbed as composition-aware graphic layout GAN (CGL-GAN), to
synthesize layouts based on the global and spatial visual contents of input
images. To obtain training images from images that already contain manually
designed graphic layout data, previous work suggests masking design elements
(e.g., texts and embellishments) as model inputs, which inevitably leaves hint
of the ground truth. We study the misalignment between the training inputs
(with hint masks) and test inputs (without masks), and design a novel domain
alignment module (DAM) to narrow this gap. For training, we built a large-scale
layout dataset which consists of 60,548 advertising posters with annotated
layout information. To evaluate the generated layouts, we propose three novel
metrics according to aesthetic intuitions. Through both quantitative and
qualitative evaluations, we demonstrate that the proposed model can synthesize
high-quality graphic layouts according to image compositions.
Related papers
- Self-supervised Photographic Image Layout Representation Learning [5.009120058742792]
We develop an autoencoder-based network architecture skilled in compressing heterogeneous layout graphs into precise, dimensionally-reduced layout representations.
We introduce the LODB dataset, which features a broader range of layout categories and richer semantics.
Our extensive experimentation on this dataset demonstrates the superior performance of our approach in the realm of photographic image layout representation learning.
arXiv Detail & Related papers (2024-03-06T14:28:53Z) - Dense Text-to-Image Generation with Attention Modulation [49.287458275920514]
Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions.
We propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions.
We achieve similar-quality visual results with models specifically trained with layout conditions.
arXiv Detail & Related papers (2023-08-24T17:59:01Z) - Enhancing Visually-Rich Document Understanding via Layout Structure
Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model.
We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z) - LayoutGPT: Compositional Visual Planning and Generation with Large
Language Models [98.81962282674151]
Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions.
We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language.
arXiv Detail & Related papers (2023-05-24T17:56:16Z) - PosterLayout: A New Benchmark and Approach for Content-aware
Visual-Textual Presentation Layout [62.12447593298437]
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements.
We propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers.
A novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts.
arXiv Detail & Related papers (2023-03-28T12:48:36Z) - Unsupervised Domain Adaption with Pixel-level Discriminator for
Image-aware Layout Generation [24.625282719753915]
This paper focuses on using the GAN-based model conditioned on image contents to generate advertising poster graphic layouts.
It combines unsupervised domain techniques to design a GAN with a novel pixel-level discriminator (PD), called PDA-GAN, to generate graphic layouts according to image contents.
Both quantitative and qualitative evaluations demonstrate that PDA-GAN can achieve state-of-the-art performances.
arXiv Detail & Related papers (2023-03-25T06:50:22Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z) - SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels.
The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level.
We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z) - Geometry Aligned Variational Transformer for Image-conditioned Layout
Generation [38.747175229902396]
We propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image.
First, self-attention mechanism is adopted to model the contextual relationship within layout elements, while cross-attention mechanism is used to fuse the visual information of conditional images.
We construct a large-scale advertisement poster layout designing dataset with delicate layout and saliency map annotations.
arXiv Detail & Related papers (2022-09-02T07:19:12Z) - Interactive Image Synthesis with Panoptic Layout Generation [14.1026819862002]
We propose Panoptic Layout Generative Adversarial Networks (PLGAN) to address this challenge.
PLGAN employs panoptic theory which distinguishes object categories between "stuff" with amorphous boundaries and "things" with well-defined shapes.
We experimentally compare our PLGAN with state-of-the-art layout-based models on the COCO-Stuff, Visual Genome, and Landscape datasets.
arXiv Detail & Related papers (2022-03-04T02:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.