UniControl: A Unified Diffusion Model for Controllable Visual Generation
In the Wild
- URL: http://arxiv.org/abs/2305.11147v3
- Date: Thu, 2 Nov 2023 17:59:06 GMT
- Title: UniControl: A Unified Diffusion Model for Controllable Visual Generation
In the Wild
- Authors: Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Huan
Wang, Juan Carlos Niebles, Caiming Xiong, Silvio Savarese, Stefano Ermon, Yun
Fu, Ran Xu
- Abstract summary: We introduce UniControl, a new generative foundation model for controllable condition-to-image (C2I) tasks.
UniControl consolidates a wide array of C2I tasks within a singular framework, while still allowing for arbitrary language prompts.
trained on nine unique C2I tasks, UniControl demonstrates impressive zero-shot generation abilities.
- Score: 166.25327094261038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Achieving machine autonomy and human control often represent divergent
objectives in the design of interactive AI systems. Visual generative
foundation models such as Stable Diffusion show promise in navigating these
goals, especially when prompted with arbitrary languages. However, they often
fall short in generating images with spatial, structural, or geometric
controls. The integration of such controls, which can accommodate various
visual conditions in a single unified model, remains an unaddressed challenge.
In response, we introduce UniControl, a new generative foundation model that
consolidates a wide array of controllable condition-to-image (C2I) tasks within
a singular framework, while still allowing for arbitrary language prompts.
UniControl enables pixel-level-precise image generation, where visual
conditions primarily influence the generated structures and language prompts
guide the style and context. To equip UniControl with the capacity to handle
diverse visual conditions, we augment pretrained text-to-image diffusion models
and introduce a task-aware HyperNet to modulate the diffusion models, enabling
the adaptation to different C2I tasks simultaneously. Trained on nine unique
C2I tasks, UniControl demonstrates impressive zero-shot generation abilities
with unseen visual conditions. Experimental results show that UniControl often
surpasses the performance of single-task-controlled methods of comparable model
sizes. This control versatility positions UniControl as a significant
advancement in the realm of controllable visual generation.
Related papers
- CAR: Controllable Autoregressive Modeling for Visual Generation [100.33455832783416]
Controllable AutoRegressive Modeling (CAR) is a novel, plug-and-play framework that integrates conditional control into multi-scale latent variable modeling.
CAR progressively refines and captures control representations, which are injected into each autoregressive step of the pre-trained model to guide the generation process.
Our approach demonstrates excellent controllability across various types of conditions and delivers higher image quality compared to previous methods.
arXiv Detail & Related papers (2024-10-07T00:55:42Z) - AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation [24.07613591217345]
Linguistic control enables effective content creation, but struggles with fine-grained control over image generation.
AnyControl develops a novel Multi-Control framework that extracts a unified multi-modal embedding to guide the generation process.
This approach enables a holistic understanding of user inputs, and produces high-quality, faithful results under versatile control signals.
arXiv Detail & Related papers (2024-06-27T07:40:59Z) - ControlVAR: Exploring Controllable Visual Autoregressive Modeling [48.66209303617063]
Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs)
Challenges such as expensive computational cost, high inference latency, and difficulties of integration with large language models (LLMs) have necessitated exploring alternatives to DMs.
This paper introduces Controlmore, a novel framework that explores pixel-level controls in visual autoregressive modeling for flexible and efficient conditional generation.
arXiv Detail & Related papers (2024-06-14T06:35:33Z) - Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance [36.50036055679903]
Recent controllable generation approaches bring fine-grained spatial and appearance control to text-to-image (T2I) diffusion models without training auxiliary modules.
This work presents Ctrl-X, a simple framework for T2I diffusion controlling structure and appearance without additional training or guidance.
arXiv Detail & Related papers (2024-06-11T17:59:01Z) - OmniControlNet: Dual-stage Integration for Conditional Image Generation [61.1432268643639]
We provide a two-way integration for the widely adopted ControlNet by integrating external condition generation algorithms into a single dense prediction method.
Our proposed OmniControlNet consolidates 1) the condition generation by a single multi-tasking dense prediction algorithm under the task embedding guidance and 2) the image generation process for different conditioning types under the textual embedding guidance.
arXiv Detail & Related papers (2024-06-09T18:03:47Z) - FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation [99.4649330193233]
Controllable text-to-image (T2I) diffusion models generate images conditioned on both text prompts and semantic inputs of other modalities like edge maps.
We propose a novel Flexible and Efficient method, FlexEControl, for controllable T2I generation.
arXiv Detail & Related papers (2024-05-08T06:09:11Z) - FreeControl: Training-Free Spatial Control of Any Text-to-Image
Diffusion Model with Any Condition [41.92032568474062]
FreeControl is a training-free approach for controllable T2I generation.
It supports multiple conditions, architectures, and checkpoints simultaneously.
It achieves competitive synthesis quality with training-based approaches.
arXiv Detail & Related papers (2023-12-12T18:59:14Z) - Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models [82.19740045010435]
We introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls and global controls.
Unlike existing methods, Uni-ControlNet only requires the fine-tuning of two additional adapters upon frozen pre-trained text-to-image diffusion models.
Uni-ControlNet demonstrates its superiority over existing methods in terms of controllability, generation quality and composability.
arXiv Detail & Related papers (2023-05-25T17:59:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.