Related papers: UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout Generation

UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout Generation

URL: http://arxiv.org/abs/2512.08897v1
Date: Tue, 09 Dec 2025 18:38:44 GMT
Title: UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout Generation
Authors: Zeyang Liu, Le Wang, Sanping Zhou, Yuxuan Wu, Xiaolong Sun, Gang Hua, Haoxiang Li,
Abstract summary: We propose UniLayDiff: a Unified Diffusion Transformer for content-aware layout generation tasks.<n>We employ Multi-Modal Diffusion Transformer framework to capture the complex interplay between the background image, layout elements, and diverse constraints.<n>Experiments demonstrate that UniLayDiff achieves state-of-the-art performance across from unconditional to various conditional generation tasks.
Score: 54.38636515750502
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Content-aware layout generation is a critical task in graphic design automation, focused on creating visually appealing arrangements of elements that seamlessly blend with a given background image. The variety of real-world applications makes it highly challenging to develop a single model capable of unifying the diverse range of input-constrained generation sub-tasks, such as those conditioned by element types, sizes, or their relationships. Current methods either address only a subset of these tasks or necessitate separate model parameters for different conditions, failing to offer a truly unified solution. In this paper, we propose UniLayDiff: a Unified Diffusion Transformer, that for the first time, addresses various content-aware layout generation tasks with a single, end-to-end trainable model. Specifically, we treat layout constraints as a distinct modality and employ Multi-Modal Diffusion Transformer framework to capture the complex interplay between the background image, layout elements, and diverse constraints. Moreover, we integrate relation constraints through fine-tuning the model with LoRA after pretraining the model on other tasks. Such a schema not only achieves unified conditional generation but also enhances overall layout quality. Extensive experiments demonstrate that UniLayDiff achieves state-of-the-art performance across from unconditional to various conditional generation tasks and, to the best of our knowledge, is the first model to unify the full range of content-aware layout generation tasks.

Related papers

VINO: A Unified Visual Generator with Interleaved OmniModal Context [36.71641694179164]
VINO is a unified visual generator that performs image and video generation and editing within a single framework.<n>Instead of relying on task-specific models or independent modules for each modality, VINO uses a shared diffusion backbone.
arXiv Detail & Related papers (2026-01-05T18:56:34Z)
Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach [99.80480649258557]
DiTFuse is an instruction-driven framework that performs semantics-aware fusion within a single model.<n>Experiments on public IVIF, MFF, and MEF benchmarks confirm superior quantitative and qualitative performance, sharper textures, and better semantic retention.
arXiv Detail & Related papers (2025-12-08T05:04:54Z)
CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design [69.83433430133302]
CreatiDesign is a systematic solution for automated graphic design covering both model architecture and dataset construction.<n>First, we design a unified multi-condition driven architecture that enables flexible and precise integration of heterogeneous design elements.<n> Furthermore, to ensure that each condition precisely controls its designated image region, we propose a multimodal attention mask mechanism.
arXiv Detail & Related papers (2025-05-25T12:14:23Z)
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing [59.590505989071175]
Text-to-Image (T2I) diffusion models have shown impressive results in generating visually compelling images following user prompts.<n>We introduce UniVG, a generalist diffusion model capable of supporting a diverse range of image generation tasks with a single set of weights.
arXiv Detail & Related papers (2025-03-16T21:11:25Z)
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer [24.159791066104358]
We introduce a DiT-based multi-conditional controllable generative framework capable of handling any combination of conditions.<n>Specifically, we introduce a novel MMDiT Attention mechanism and incorporate a trainable LoRA module.<n>We also propose a new pipeline to construct SubjectSpatial200K, the first dataset designed for multi-conditional generative tasks.
arXiv Detail & Related papers (2025-03-12T11:22:47Z)
EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks.<n>The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm.<n>We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z)
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer [40.32254040909614]
We propose ACE, an All-round Creator and Editor, for visual generation tasks. We first introduce a unified condition format termed Long-context Condition Unit (LCU) We then propose a novel Transformer-based diffusion model that uses LCU as input, aiming for joint training across various generation and editing tasks.
arXiv Detail & Related papers (2024-09-30T17:56:27Z)
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation [71.24909962718128]
We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation.<n>Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various and mixed modalities.
arXiv Detail & Related papers (2024-08-22T16:32:32Z)
Unifying Layout Generation with a Decoupled Diffusion Model [26.659337441975143]
It is a crucial task for reducing the burden on heavy-duty graphic design works for formatted scenes, e.g., publications, documents, and user interfaces (UIs) We propose a layout Diffusion Generative Model (LDGM) to achieve such unification with a single decoupled diffusion model. Our proposed LDGM can generate layouts either from scratch or conditional on arbitrary available attributes.
arXiv Detail & Related papers (2023-03-09T05:53:32Z)
DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer [2.0483033421034142]
We introduce DLT, a joint discrete-continuous diffusion model. DLT has a flexible conditioning mechanism that allows for conditioning on any given subset of all the layout component classes, locations, and sizes. Our method outperforms state-of-the-art generative models on various layout generation datasets with respect to different metrics and conditioning settings.
arXiv Detail & Related papers (2023-03-07T09:30:43Z)
LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction [37.6871815321083]
Conditional graphic layout generation is a challenging task that has not been well-studied yet. We propose a constraint serialization scheme, a sequence-to-sequence transformation, and a decoding space restriction strategy. Experiments demonstrate that LayoutFormer++ outperforms existing approaches on all the tasks in terms of both better generation quality and less constraint violation.
arXiv Detail & Related papers (2022-08-17T02:43:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.