FreeControl: Training-Free Spatial Control of Any Text-to-Image
Diffusion Model with Any Condition
- URL: http://arxiv.org/abs/2312.07536v1
- Date: Tue, 12 Dec 2023 18:59:14 GMT
- Title: FreeControl: Training-Free Spatial Control of Any Text-to-Image
Diffusion Model with Any Condition
- Authors: Sicheng Mo, Fangzhou Mu, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin
Li, Bolei Zhou
- Abstract summary: FreeControl is a training-free approach for controllable T2I generation.
It supports multiple conditions, architectures, and checkpoints simultaneously.
It achieves competitive synthesis quality with training-based approaches.
- Score: 41.92032568474062
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent approaches such as ControlNet offer users fine-grained spatial control
over text-to-image (T2I) diffusion models. However, auxiliary modules have to
be trained for each type of spatial condition, model architecture, and
checkpoint, putting them at odds with the diverse intents and preferences a
human designer would like to convey to the AI models during the content
creation process. In this work, we present FreeControl, a training-free
approach for controllable T2I generation that supports multiple conditions,
architectures, and checkpoints simultaneously. FreeControl designs structure
guidance to facilitate the structure alignment with a guidance image, and
appearance guidance to enable the appearance sharing between images generated
using the same seed. Extensive qualitative and quantitative experiments
demonstrate the superior performance of FreeControl across a variety of
pre-trained T2I models. In particular, FreeControl facilitates convenient
training-free control over many different architectures and checkpoints, allows
the challenging input conditions on which most of the existing training-free
methods fail, and achieves competitive synthesis quality with training-based
approaches.
Related papers
- ControlNeXt: Powerful and Efficient Control for Image and Video Generation [59.62289489036722]
We propose ControlNeXt: a powerful and efficient method for controllable image and video generation.
We first design a more straightforward and efficient architecture, replacing heavy additional branches with minimal additional cost.
As for training, we reduce up to 90% of learnable parameters compared to the alternatives.
arXiv Detail & Related papers (2024-08-12T11:41:18Z) - AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation [24.07613591217345]
Linguistic control enables effective content creation, but struggles with fine-grained control over image generation.
AnyControl develops a novel Multi-Control framework that extracts a unified multi-modal embedding to guide the generation process.
This approach enables a holistic understanding of user inputs, and produces high-quality, faithful results under versatile control signals.
arXiv Detail & Related papers (2024-06-27T07:40:59Z) - Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance [36.50036055679903]
Recent controllable generation approaches bring fine-grained spatial and appearance control to text-to-image (T2I) diffusion models without training auxiliary modules.
This work presents Ctrl-X, a simple framework for T2I diffusion controlling structure and appearance without additional training or guidance.
arXiv Detail & Related papers (2024-06-11T17:59:01Z) - OmniControlNet: Dual-stage Integration for Conditional Image Generation [61.1432268643639]
We provide a two-way integration for the widely adopted ControlNet by integrating external condition generation algorithms into a single dense prediction method.
Our proposed OmniControlNet consolidates 1) the condition generation by a single multi-tasking dense prediction algorithm under the task embedding guidance and 2) the image generation process for different conditioning types under the textual embedding guidance.
arXiv Detail & Related papers (2024-06-09T18:03:47Z) - FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation [99.4649330193233]
Controllable text-to-image (T2I) diffusion models generate images conditioned on both text prompts and semantic inputs of other modalities like edge maps.
We propose a novel Flexible and Efficient method, FlexEControl, for controllable T2I generation.
arXiv Detail & Related papers (2024-05-08T06:09:11Z) - Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models [82.19740045010435]
We introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls and global controls.
Unlike existing methods, Uni-ControlNet only requires the fine-tuning of two additional adapters upon frozen pre-trained text-to-image diffusion models.
Uni-ControlNet demonstrates its superiority over existing methods in terms of controllability, generation quality and composability.
arXiv Detail & Related papers (2023-05-25T17:59:58Z) - UniControl: A Unified Diffusion Model for Controllable Visual Generation
In the Wild [166.25327094261038]
We introduce UniControl, a new generative foundation model for controllable condition-to-image (C2I) tasks.
UniControl consolidates a wide array of C2I tasks within a singular framework, while still allowing for arbitrary language prompts.
trained on nine unique C2I tasks, UniControl demonstrates impressive zero-shot generation abilities.
arXiv Detail & Related papers (2023-05-18T17:41:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.