CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design
- URL: http://arxiv.org/abs/2505.19114v2
- Date: Wed, 28 May 2025 03:34:54 GMT
- Title: CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design
- Authors: Hui Zhang, Dexiang Hong, Maoke Yang, Yutao Cheng, Zhao Zhang, Jie Shao, Xinglong Wu, Zuxuan Wu, Yu-Gang Jiang,
- Abstract summary: CreatiDesign is a systematic solution for automated graphic design covering both model architecture and dataset construction.<n>First, we design a unified multi-condition driven architecture that enables flexible and precise integration of heterogeneous design elements.<n> Furthermore, to ensure that each condition precisely controls its designated image region, we propose a multimodal attention mask mechanism.
- Score: 69.83433430133302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graphic design plays a vital role in visual communication across advertising, marketing, and multimedia entertainment. Prior work has explored automated graphic design generation using diffusion models, aiming to streamline creative workflows and democratize design capabilities. However, complex graphic design scenarios require accurately adhering to design intent specified by multiple heterogeneous user-provided elements (\eg images, layouts, and texts), which pose multi-condition control challenges for existing methods. Specifically, previous single-condition control models demonstrate effectiveness only within their specialized domains but fail to generalize to other conditions, while existing multi-condition methods often lack fine-grained control over each sub-condition and compromise overall compositional harmony. To address these limitations, we introduce CreatiDesign, a systematic solution for automated graphic design covering both model architecture and dataset construction. First, we design a unified multi-condition driven architecture that enables flexible and precise integration of heterogeneous design elements with minimal architectural modifications to the base diffusion model. Furthermore, to ensure that each condition precisely controls its designated image region and to avoid interference between conditions, we propose a multimodal attention mask mechanism. Additionally, we develop a fully automated pipeline for constructing graphic design datasets, and introduce a new dataset with 400K samples featuring multi-condition annotations, along with a comprehensive benchmark. Experimental results show that CreatiDesign outperforms existing models by a clear margin in faithfully adhering to user intent.
Related papers
- Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation [72.44384066166147]
Multi-agent systems (MAS) based on large language models (LLMs) have emerged as a powerful solution for dealing with complex problems across diverse domains.<n>Existing approaches are fundamentally constrained by their reliance on a template graph modification paradigm with a predefined set of agents and hard-coded interaction structures.<n>We propose ARG-Designer, a novel autoregressive model that operationalizes this paradigm by constructing the collaboration graph from scratch.
arXiv Detail & Related papers (2025-07-24T09:17:41Z) - IGD: Instructional Graphic Design with Multimodal Layer Generation [83.31320209596991]
Two-stage methods that rely primarily on layout generation lack creativity and intelligence, making graphic design still labor-intensive.<n>We propose instructional graphic designer (IGD) to swiftly generate multimodal layers with editable flexibility with only natural language instructions.
arXiv Detail & Related papers (2025-07-14T04:31:15Z) - Multi-View Depth Consistent Image Generation Using Generative AI Models: Application on Architectural Design of University Buildings [20.569648863933285]
We propose a novel three-stage consistent image generation framework using generative AI models.<n>We employ ControlNet as the backbone and optimize it to accommodate multi-view inputs of architectural shoebox models.<n> Experimental results demonstrate that the proposed framework can generate multi-view architectural images with consistent style and structural coherence.
arXiv Detail & Related papers (2025-03-05T00:16:09Z) - DiffDesign: Controllable Diffusion with Meta Prior for Efficient Interior Design Generation [21.910447939103385]
We propose DiffDesign, a controllable diffusion model with meta priors for efficient interior design generation.<n>Specifically, we utilize the generative priors of a 2D diffusion model pre-trained on a large image dataset as our rendering backbone.<n>We further guide the denoising process by disentangling cross-attention control over design attributes, such as appearance, pose, and size, and introduce an optimal transfer-based alignment module to enforce view consistency.
arXiv Detail & Related papers (2024-11-25T11:36:34Z) - GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a Vision-Language Model (VLM)-based framework that generates content-aware text logo layouts.<n>We introduce two model techniques that reduce the computational cost for processing multiple glyph images simultaneously.<n>To support instruction tuning of our model, we construct two extensive text logo datasets that are five times larger than existing public datasets.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.<n>Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.<n>We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design [39.809852329070466]
This paper introduces the COLE system - a hierarchical generation framework designed to address these challenges.
This COLE system can transform a vague intention prompt into a high-quality multi-layered graphic design, while also supporting flexible editing based on user input.
arXiv Detail & Related papers (2023-11-28T17:22:17Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.