Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
- URL: http://arxiv.org/abs/2311.13602v4
- Date: Mon, 15 Apr 2024 23:29:51 GMT
- Title: Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
- Authors: Daichi Horita, Naoto Inoue, Kotaro Kikuchi, Kota Yamaguchi, Kiyoharu Aizawa,
- Abstract summary: Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image.
We show that a simple retrieval augmentation can significantly improve the generation quality.
Our model, which is named Retrieval-Augmented Layout Transformer (RALF), retrieves nearest neighbor layout examples based on an input image and feeds these results into an autoregressive generator.
- Score: 30.101562738257588
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image. In this paper, we argue that the current layout generation approaches suffer from the limited training data for the high-dimensional layout structure. We show that a simple retrieval augmentation can significantly improve the generation quality. Our model, which is named Retrieval-Augmented Layout Transformer (RALF), retrieves nearest neighbor layout examples based on an input image and feeds these results into an autoregressive generator. Our model can apply retrieval augmentation to various controllable generation tasks and yield high-quality layouts within a unified architecture. Our extensive experiments show that RALF successfully generates content-aware layouts in both constrained and unconstrained settings and significantly outperforms the baselines.
Related papers
- Group Diffusion Transformers are Unsupervised Multitask Learners [49.288489286276146]
Group Diffusion Transformers (GDTs) are a novel framework that unifies diverse visual generation tasks.
GDTs build upon diffusion transformers with minimal architectural modifications by concatenating self-attention tokens across images.
We evaluate GDTs on a benchmark featuring over 200 instructions across 30 distinct visual generation tasks.
arXiv Detail & Related papers (2024-10-19T07:53:15Z) - LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer [46.67415676699221]
We introduce a framework that balances content and graphic features to generate high-quality, visually appealing layouts.
Specifically, we design an adaptive factor that optimize the model's awareness of the layout generation space.
We also introduce a graphic condition, the saliency bounding box, to bridge the modality difference between images in the visual domain and layouts in the geometric parameter domain.
arXiv Detail & Related papers (2024-07-21T17:58:21Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - PosterLlama: Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation [6.855409699832414]
PosterLlama is a network designed for generating visually and textually coherent layouts.
Our evaluations demonstrate that PosterLlama outperforms existing methods in producing authentic and content-aware layouts.
It supports an unparalleled range of conditions, including but not limited to unconditional layout generation, element conditional layout generation, layout completion, among others, serving as a highly versatile user manipulation tool.
arXiv Detail & Related papers (2024-04-01T08:46:35Z) - LayoutDM: Transformer-based Diffusion Model for Layout Generation [0.6445605125467572]
Transformer-based diffusion model (DDPM) is proposed to generate high-quality images.
Transformer-based conditional Layout Denoiser is proposed to generate samples from noised layout data.
Our method outperforms state-of-the-art generative models in terms of quality and diversity.
arXiv Detail & Related papers (2023-05-04T05:51:35Z) - Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation [147.81509219686419]
We propose a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number, position, size, and shape.
Next, we propose IterInpaint, a new baseline that generates foreground and background regions step-by-step via inpainting.
We show comprehensive ablation studies on IterInpaint, including training task ratio, crop&paste vs. repaint, and generation order.
arXiv Detail & Related papers (2023-04-13T16:58:33Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z) - Hierarchical Semantic Regularization of Latent Spaces in StyleGANs [53.98170188547775]
We propose a Hierarchical Semantic Regularizer (HSR) which aligns the hierarchical representations learnt by the generator to corresponding powerful features learnt by pretrained networks on large amounts of data.
HSR is shown to not only improve generator representations but also the linearity and smoothness of the latent style spaces, leading to the generation of more natural-looking style-edited images.
arXiv Detail & Related papers (2022-08-07T16:23:33Z) - BLT: Bidirectional Layout Transformer for Controllable Layout Generation [27.239276265955954]
We introduce BLT, a bidirectional layout transformer for conditional layout generation.
We verify the proposed model on multiple benchmarks with various fidelity metrics.
Our results demonstrate two key advances to the state-of-the-art layout transformer models.
arXiv Detail & Related papers (2021-12-09T18:49:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.