LayoutRAG: Retrieval-Augmented Model for Content-agnostic Conditional Layout Generation
- URL: http://arxiv.org/abs/2506.02697v1
- Date: Tue, 03 Jun 2025 09:47:03 GMT
- Title: LayoutRAG: Retrieval-Augmented Model for Content-agnostic Conditional Layout Generation
- Authors: Yuxuan Wu, Le Wang, Sanping Zhou, Mengnan Liu, Gang Hua, Haoxiang Li,
- Abstract summary: Controllable layout generation aims to create plausible visual arrangements of element bounding boxes within a graphic design.<n>We propose to carry out layout generation through retrieving by conditions and reference-guided generation.<n>Our method successfully produces high-quality layouts that meet the given conditions and outperforms existing state-of-the-art models.
- Score: 34.39449499558055
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Controllable layout generation aims to create plausible visual arrangements of element bounding boxes within a graphic design according to certain optional constraints, such as the type or position of a specific component. While recent diffusion or flow-matching models have achieved considerable advances in multifarious conditional generation tasks, there remains considerable room for generating optimal arrangements under given conditions. In this work, we propose to carry out layout generation through retrieving by conditions and reference-guided generation. Specifically, we retrieve appropriate layout templates according to given conditions as references. The references are then utilized to guide the denoising or flow-based transport process. By retrieving layouts compatible with the given conditions, we can uncover the potential information not explicitly provided in the given condition. Such an approach offers more effective guidance to the model during the generation process, in contrast to previous models that feed the condition to the model and let the model infer the unprovided layout attributes directly. Meanwhile, we design a condition-modulated attention that selectively absorbs retrieval knowledge, adapting to the difference between retrieved templates and given conditions. Extensive experiment results show that our method successfully produces high-quality layouts that meet the given conditions and outperforms existing state-of-the-art models. Code will be released upon acceptance.
Related papers
- ACD: Direct Conditional Control for Video Diffusion Models via Attention Supervision [62.41380823195191]
We propose Attention-Conditional Diffusion, a framework for direct conditional control in video diffusion models via attention supervision.<n>ACD achieves better controllability by aligning the model's attention maps with external control signals.<n>Experiments on benchmark video generation datasets demonstrate that ACD delivers superior alignment with conditioning inputs.
arXiv Detail & Related papers (2025-12-24T16:24:18Z) - Composition and Alignment of Diffusion Models using Constrained Learning [79.36736636241564]
Diffusion models have become prevalent in generative modeling due to their ability to sample from complex distributions.<n>Two commonly used methods are: (i) alignment, which involves fine-tuning a diffusion model to align it with a reward; and (ii) composition, which combines several pre-trained diffusion models, each emphasizing a desirable attribute in the generated outputs.<n>We propose a constrained optimization framework that unifies alignment and composition of diffusion models by enforcing that the aligned model satisfies reward constraints and/or remains close to (potentially multiple) pre-trained models.
arXiv Detail & Related papers (2025-08-26T15:06:30Z) - Diffusion Models with Double Guidance: Generate with aggregated datasets [18.0878149546412]
Large-scale datasets for training high-performance generative models are often prohibitively expensive, especially when associated attributes or annotations must be provided.<n>This presents a significant challenge for conditional generative modeling when the multiple attributes are used jointly as conditions.<n>We propose a novel generative approach, Diffusion Model with Double Guidance, which enables precise conditional generation even when no training samples contain all conditions simultaneously.
arXiv Detail & Related papers (2025-05-19T14:59:35Z) - Imagining the Unseen: Generative Location Modeling for Object Placement [49.71690795831461]
We develop a generative location model that learns to predict plausible bounding boxes for an object.<n>Our approach first tokenizes the image and target object class, then decodes bounding box coordinates through an autoregressive transformer.<n> Empirical evaluations reveal that our generative location model achieves superior placement accuracy on the OPA dataset.
arXiv Detail & Related papers (2024-10-17T14:00:41Z) - Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints [53.66698106829144]
We propose a unified model to handle a broad range of layout generation tasks.
The model is based on continuous diffusion models.
Experiment results show that LACE produces high-quality layouts.
arXiv Detail & Related papers (2024-02-07T11:12:41Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations [94.04631421741986]
We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
arXiv Detail & Related papers (2023-06-02T20:09:57Z) - LayoutDiffusion: Improving Graphic Layout Generation by Discrete
Diffusion Probabilistic Models [50.73105631853759]
We present a novel generative model named LayoutDiffusion for automatic layout generation.
It learns to reverse a mild forward process, in which layouts become increasingly chaotic with the growth of forward steps.
It enables two conditional layout generation tasks in a plug-and-play manner without re-training and achieves better performance than existing methods.
arXiv Detail & Related papers (2023-03-21T04:41:02Z) - LayoutDM: Discrete Diffusion Model for Controllable Layout Generation [27.955214767628107]
Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints.
In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models.
Our model, named LayoutDM, naturally handles the structured layout data in the discrete representation and learns to progressively infer a noiseless layout from the initial input.
arXiv Detail & Related papers (2023-03-14T17:59:47Z) - Unifying Layout Generation with a Decoupled Diffusion Model [26.659337441975143]
It is a crucial task for reducing the burden on heavy-duty graphic design works for formatted scenes, e.g., publications, documents, and user interfaces (UIs)
We propose a layout Diffusion Generative Model (LDGM) to achieve such unification with a single decoupled diffusion model.
Our proposed LDGM can generate layouts either from scratch or conditional on arbitrary available attributes.
arXiv Detail & Related papers (2023-03-09T05:53:32Z) - Maximum Likelihood on the Joint (Data, Condition) Distribution for
Solving Ill-Posed Problems with Conditional Flow Models [0.0]
I describe a trick for training flow models using a prescribed rule as a surrogate for maximum likelihood.
I demonstrate these properties on easily visualized toy problems, then use the method to successfully generate class-conditional images.
arXiv Detail & Related papers (2022-08-24T21:50:25Z) - GRIT: Generative Role-filler Transformers for Document-level Event
Entity Extraction [134.5580003327839]
We introduce a generative transformer-based encoder-decoder framework (GRIT) to model context at the document level.
We evaluate our approach on the MUC-4 dataset, and show that our model performs substantially better than prior work.
arXiv Detail & Related papers (2020-08-21T01:07:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.