UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout Generation
- URL: http://arxiv.org/abs/2512.08897v1
- Date: Tue, 09 Dec 2025 18:38:44 GMT
- Title: UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout Generation
- Authors: Zeyang Liu, Le Wang, Sanping Zhou, Yuxuan Wu, Xiaolong Sun, Gang Hua, Haoxiang Li,
- Abstract summary: We propose UniLayDiff: a Unified Diffusion Transformer for content-aware layout generation tasks.<n>We employ Multi-Modal Diffusion Transformer framework to capture the complex interplay between the background image, layout elements, and diverse constraints.<n>Experiments demonstrate that UniLayDiff achieves state-of-the-art performance across from unconditional to various conditional generation tasks.
- Score: 54.38636515750502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Content-aware layout generation is a critical task in graphic design automation, focused on creating visually appealing arrangements of elements that seamlessly blend with a given background image. The variety of real-world applications makes it highly challenging to develop a single model capable of unifying the diverse range of input-constrained generation sub-tasks, such as those conditioned by element types, sizes, or their relationships. Current methods either address only a subset of these tasks or necessitate separate model parameters for different conditions, failing to offer a truly unified solution. In this paper, we propose UniLayDiff: a Unified Diffusion Transformer, that for the first time, addresses various content-aware layout generation tasks with a single, end-to-end trainable model. Specifically, we treat layout constraints as a distinct modality and employ Multi-Modal Diffusion Transformer framework to capture the complex interplay between the background image, layout elements, and diverse constraints. Moreover, we integrate relation constraints through fine-tuning the model with LoRA after pretraining the model on other tasks. Such a schema not only achieves unified conditional generation but also enhances overall layout quality. Extensive experiments demonstrate that UniLayDiff achieves state-of-the-art performance across from unconditional to various conditional generation tasks and, to the best of our knowledge, is the first model to unify the full range of content-aware layout generation tasks.
Related papers
- VINO: A Unified Visual Generator with Interleaved OmniModal Context [36.71641694179164]
VINO is a unified visual generator that performs image and video generation and editing within a single framework.<n>Instead of relying on task-specific models or independent modules for each modality, VINO uses a shared diffusion backbone.
arXiv Detail & Related papers (2026-01-05T18:56:34Z) - Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach [99.80480649258557]
DiTFuse is an instruction-driven framework that performs semantics-aware fusion within a single model.<n>Experiments on public IVIF, MFF, and MEF benchmarks confirm superior quantitative and qualitative performance, sharper textures, and better semantic retention.
arXiv Detail & Related papers (2025-12-08T05:04:54Z) - CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design [69.83433430133302]
CreatiDesign is a systematic solution for automated graphic design covering both model architecture and dataset construction.<n>First, we design a unified multi-condition driven architecture that enables flexible and precise integration of heterogeneous design elements.<n> Furthermore, to ensure that each condition precisely controls its designated image region, we propose a multimodal attention mask mechanism.
arXiv Detail & Related papers (2025-05-25T12:14:23Z) - UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing [59.590505989071175]
Text-to-Image (T2I) diffusion models have shown impressive results in generating visually compelling images following user prompts.<n>We introduce UniVG, a generalist diffusion model capable of supporting a diverse range of image generation tasks with a single set of weights.
arXiv Detail & Related papers (2025-03-16T21:11:25Z) - UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer [24.159791066104358]
We introduce a DiT-based multi-conditional controllable generative framework capable of handling any combination of conditions.<n>Specifically, we introduce a novel MMDiT Attention mechanism and incorporate a trainable LoRA module.<n>We also propose a new pipeline to construct SubjectSpatial200K, the first dataset designed for multi-conditional generative tasks.
arXiv Detail & Related papers (2025-03-12T11:22:47Z) - EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks.<n>The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm.<n>We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z) - ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer [40.32254040909614]
We propose ACE, an All-round Creator and Editor, for visual generation tasks.
We first introduce a unified condition format termed Long-context Condition Unit (LCU)
We then propose a novel Transformer-based diffusion model that uses LCU as input, aiming for joint training across various generation and editing tasks.
arXiv Detail & Related papers (2024-09-30T17:56:27Z) - Show-o: One Single Transformer to Unify Multimodal Understanding and Generation [71.24909962718128]
We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation.<n>Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various and mixed modalities.
arXiv Detail & Related papers (2024-08-22T16:32:32Z) - Unifying Layout Generation with a Decoupled Diffusion Model [26.659337441975143]
It is a crucial task for reducing the burden on heavy-duty graphic design works for formatted scenes, e.g., publications, documents, and user interfaces (UIs)
We propose a layout Diffusion Generative Model (LDGM) to achieve such unification with a single decoupled diffusion model.
Our proposed LDGM can generate layouts either from scratch or conditional on arbitrary available attributes.
arXiv Detail & Related papers (2023-03-09T05:53:32Z) - DLT: Conditioned layout generation with Joint Discrete-Continuous
Diffusion Layout Transformer [2.0483033421034142]
We introduce DLT, a joint discrete-continuous diffusion model.
DLT has a flexible conditioning mechanism that allows for conditioning on any given subset of all the layout component classes, locations, and sizes.
Our method outperforms state-of-the-art generative models on various layout generation datasets with respect to different metrics and conditioning settings.
arXiv Detail & Related papers (2023-03-07T09:30:43Z) - LayoutFormer++: Conditional Graphic Layout Generation via Constraint
Serialization and Decoding Space Restriction [37.6871815321083]
Conditional graphic layout generation is a challenging task that has not been well-studied yet.
We propose a constraint serialization scheme, a sequence-to-sequence transformation, and a decoding space restriction strategy.
Experiments demonstrate that LayoutFormer++ outperforms existing approaches on all the tasks in terms of both better generation quality and less constraint violation.
arXiv Detail & Related papers (2022-08-17T02:43:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.