LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
- URL: http://arxiv.org/abs/2212.09877v4
- Date: Mon, 30 Sep 2024 11:49:50 GMT
- Title: LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
- Authors: Ning Yu, Chia-Chih Chen, Zeyuan Chen, Rui Meng, Gang Wu, Paul Josel, Juan Carlos Niebles, Caiming Xiong, Ran Xu,
- Abstract summary: Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
- Score: 80.61492265221817
- License:
- Abstract: Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground content. We propose LayoutDETR that inherits the high quality and realism from generative modeling, while reformulating content-aware requirements as a detection problem: we learn to detect in a background image the reasonable locations, scales, and spatial relations for multimodal foreground elements in a layout. Our solution sets a new state-of-the-art performance for layout generation on public benchmarks and on our newly-curated ad banner dataset. We integrate our solution into a graphical system that facilitates user studies, and show that users prefer our designs over baselines by significant margins. Code, models, dataset, and demos are available at https://github.com/salesforce/LayoutDETR.
Related papers
- GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts.
We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously.
To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer [46.67415676699221]
We introduce a framework that balances content and graphic features to generate high-quality, visually appealing layouts.
Specifically, we design an adaptive factor that optimize the model's awareness of the layout generation space.
We also introduce a graphic condition, the saliency bounding box, to bridge the modality difference between images in the visual domain and layouts in the geometric parameter domain.
arXiv Detail & Related papers (2024-07-21T17:58:21Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - PosterLlama: Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation [6.855409699832414]
PosterLlama is a network designed for generating visually and textually coherent layouts.
Our evaluations demonstrate that PosterLlama outperforms existing methods in producing authentic and content-aware layouts.
It supports an unparalleled range of conditions, including but not limited to unconditional layout generation, element conditional layout generation, layout completion, among others, serving as a highly versatile user manipulation tool.
arXiv Detail & Related papers (2024-04-01T08:46:35Z) - Desigen: A Pipeline for Controllable Design Template Generation [69.51563467689795]
Desigen is an automatic template creation pipeline which generates background images as well as layout elements over the background.
We propose two techniques to constrain the saliency distribution and reduce the attention weight in desired regions during the background generation process.
Experiments demonstrate that the proposed pipeline generates high-quality templates comparable to human designers.
arXiv Detail & Related papers (2024-03-14T04:32:28Z) - Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation [30.101562738257588]
Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image.
We show that a simple retrieval augmentation can significantly improve the generation quality.
Our model, which is named Retrieval-Augmented Layout Transformer (RALF), retrieves nearest neighbor layout examples based on an input image and feeds these results into an autoregressive generator.
arXiv Detail & Related papers (2023-11-22T18:59:53Z) - PosterLayout: A New Benchmark and Approach for Content-aware
Visual-Textual Presentation Layout [62.12447593298437]
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements.
We propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers.
A novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts.
arXiv Detail & Related papers (2023-03-28T12:48:36Z) - The Layout Generation Algorithm of Graphic Design Based on
Transformer-CVAE [8.052709336750823]
This paper implemented the Transformer model and conditional variational autoencoder (CVAE) to the graphic design layout generation task.
It proposed an end-to-end graphic design layout generation model named LayoutT-CVAE.
Compared with the existing state-of-art models, the layout generated by ours performs better on many metrics.
arXiv Detail & Related papers (2021-10-08T13:36:02Z) - Constrained Graphic Layout Generation via Latent Optimization [17.05026043385661]
We generate graphic layouts that can flexibly incorporate design semantics, either specified implicitly or explicitly by a user.
Our approach builds on a generative layout model based on a Transformer architecture, and formulates the layout generation as a constrained optimization problem.
We show in the experiments that our approach is capable of generating realistic layouts in both constrained and unconstrained generation tasks with a single model.
arXiv Detail & Related papers (2021-08-02T13:04:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.