LayoutRAG: Retrieval-Augmented Model for Content-agnostic Conditional Layout Generation
- URL: http://arxiv.org/abs/2506.02697v1
- Date: Tue, 03 Jun 2025 09:47:03 GMT
- Title: LayoutRAG: Retrieval-Augmented Model for Content-agnostic Conditional Layout Generation
- Authors: Yuxuan Wu, Le Wang, Sanping Zhou, Mengnan Liu, Gang Hua, Haoxiang Li,
- Abstract summary: Controllable layout generation aims to create plausible visual arrangements of element bounding boxes within a graphic design.<n>We propose to carry out layout generation through retrieving by conditions and reference-guided generation.<n>Our method successfully produces high-quality layouts that meet the given conditions and outperforms existing state-of-the-art models.
- Score: 34.39449499558055
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Controllable layout generation aims to create plausible visual arrangements of element bounding boxes within a graphic design according to certain optional constraints, such as the type or position of a specific component. While recent diffusion or flow-matching models have achieved considerable advances in multifarious conditional generation tasks, there remains considerable room for generating optimal arrangements under given conditions. In this work, we propose to carry out layout generation through retrieving by conditions and reference-guided generation. Specifically, we retrieve appropriate layout templates according to given conditions as references. The references are then utilized to guide the denoising or flow-based transport process. By retrieving layouts compatible with the given conditions, we can uncover the potential information not explicitly provided in the given condition. Such an approach offers more effective guidance to the model during the generation process, in contrast to previous models that feed the condition to the model and let the model infer the unprovided layout attributes directly. Meanwhile, we design a condition-modulated attention that selectively absorbs retrieval knowledge, adapting to the difference between retrieved templates and given conditions. Extensive experiment results show that our method successfully produces high-quality layouts that meet the given conditions and outperforms existing state-of-the-art models. Code will be released upon acceptance.
Related papers
- Diffusion Models with Double Guidance: Generate with aggregated datasets [18.0878149546412]
Large-scale datasets for training high-performance generative models are often prohibitively expensive, especially when associated attributes or annotations must be provided.<n>This presents a significant challenge for conditional generative modeling when the multiple attributes are used jointly as conditions.<n>We propose a novel generative approach, Diffusion Model with Double Guidance, which enables precise conditional generation even when no training samples contain all conditions simultaneously.
arXiv Detail & Related papers (2025-05-19T14:59:35Z) - Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints [53.66698106829144]
We propose a unified model to handle a broad range of layout generation tasks.
The model is based on continuous diffusion models.
Experiment results show that LACE produces high-quality layouts.
arXiv Detail & Related papers (2024-02-07T11:12:41Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations [94.04631421741986]
We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
arXiv Detail & Related papers (2023-06-02T20:09:57Z) - LayoutDiffusion: Improving Graphic Layout Generation by Discrete
Diffusion Probabilistic Models [50.73105631853759]
We present a novel generative model named LayoutDiffusion for automatic layout generation.
It learns to reverse a mild forward process, in which layouts become increasingly chaotic with the growth of forward steps.
It enables two conditional layout generation tasks in a plug-and-play manner without re-training and achieves better performance than existing methods.
arXiv Detail & Related papers (2023-03-21T04:41:02Z) - LayoutDM: Discrete Diffusion Model for Controllable Layout Generation [27.955214767628107]
Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints.
In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models.
Our model, named LayoutDM, naturally handles the structured layout data in the discrete representation and learns to progressively infer a noiseless layout from the initial input.
arXiv Detail & Related papers (2023-03-14T17:59:47Z) - Unifying Layout Generation with a Decoupled Diffusion Model [26.659337441975143]
It is a crucial task for reducing the burden on heavy-duty graphic design works for formatted scenes, e.g., publications, documents, and user interfaces (UIs)
We propose a layout Diffusion Generative Model (LDGM) to achieve such unification with a single decoupled diffusion model.
Our proposed LDGM can generate layouts either from scratch or conditional on arbitrary available attributes.
arXiv Detail & Related papers (2023-03-09T05:53:32Z) - Maximum Likelihood on the Joint (Data, Condition) Distribution for
Solving Ill-Posed Problems with Conditional Flow Models [0.0]
I describe a trick for training flow models using a prescribed rule as a surrogate for maximum likelihood.
I demonstrate these properties on easily visualized toy problems, then use the method to successfully generate class-conditional images.
arXiv Detail & Related papers (2022-08-24T21:50:25Z) - GRIT: Generative Role-filler Transformers for Document-level Event
Entity Extraction [134.5580003327839]
We introduce a generative transformer-based encoder-decoder framework (GRIT) to model context at the document level.
We evaluate our approach on the MUC-4 dataset, and show that our model performs substantially better than prior work.
arXiv Detail & Related papers (2020-08-21T01:07:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.