CCM: Adding Conditional Controls to Text-to-Image Consistency Models
- URL: http://arxiv.org/abs/2312.06971v1
- Date: Tue, 12 Dec 2023 04:16:03 GMT
- Title: CCM: Adding Conditional Controls to Text-to-Image Consistency Models
- Authors: Jie Xiao, Kai Zhu, Han Zhang, Zhiheng Liu, Yujun Shen, Yu Liu, Xueyang
Fu, Zheng-Jun Zha
- Abstract summary: We consider alternative strategies for adding ControlNet-like conditional control to Consistency Models.
A lightweight adapter can be jointly optimized under multiple conditions through Consistency Training.
We study these three solutions across various conditional controls, including edge, depth, human pose, low-resolution image and masked image.
- Score: 89.75377958996305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consistency Models (CMs) have showed a promise in creating visual content
efficiently and with high quality. However, the way to add new conditional
controls to the pretrained CMs has not been explored. In this technical report,
we consider alternative strategies for adding ControlNet-like conditional
control to CMs and present three significant findings. 1) ControlNet trained
for diffusion models (DMs) can be directly applied to CMs for high-level
semantic controls but struggles with low-level detail and realism control. 2)
CMs serve as an independent class of generative models, based on which
ControlNet can be trained from scratch using Consistency Training proposed by
Song et al. 3) A lightweight adapter can be jointly optimized under multiple
conditions through Consistency Training, allowing for the swift transfer of
DMs-based ControlNet to CMs. We study these three solutions across various
conditional controls, including edge, depth, human pose, low-resolution image
and masked image with text-to-image latent consistency models.
Related papers
- DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models [55.42794740244581]
We introduce DC (Decouple)-ControlNet, a framework for multi-condition image generation.
The core idea behind DC-ControlNet is to decouple control conditions, transforming global control into a hierarchical system.
For interactions between elements, we introduce the Inter-Element Controller, which accurately handles multi-element interactions.
arXiv Detail & Related papers (2025-02-20T18:01:02Z) - C3: Learning Congestion Controllers with Formal Certificates [14.750230453127413]
C3 is a new learning framework for congestion control that integrates the concept of formal certification in the learning loop.
C3-trained controllers provide both adaptability and worst-case reliability across a range of network conditions.
arXiv Detail & Related papers (2024-12-14T18:02:50Z) - DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation [63.63429658282696]
We propose DynamicControl, which supports dynamic combinations of diverse control signals.
We show that DynamicControl is superior to existing methods in terms of controllability, generation quality and composability under various conditional controls.
arXiv Detail & Related papers (2024-12-04T11:54:57Z) - ControlVAR: Exploring Controllable Visual Autoregressive Modeling [48.66209303617063]
Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs)
Challenges such as expensive computational cost, high inference latency, and difficulties of integration with large language models (LLMs) have necessitated exploring alternatives to DMs.
This paper introduces Controlmore, a novel framework that explores pixel-level controls in visual autoregressive modeling for flexible and efficient conditional generation.
arXiv Detail & Related papers (2024-06-14T06:35:33Z) - Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction
Tuning [115.50132185963139]
CM3Leon is a decoder-only multi-modal language model capable of generating and infilling both text and images.
It is the first multi-modal model trained with a recipe adapted from text-only language models.
CM3Leon achieves state-of-the-art performance in text-to-image generation with 5x less training compute than comparable methods.
arXiv Detail & Related papers (2023-09-05T21:27:27Z) - Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models [82.19740045010435]
We introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls and global controls.
Unlike existing methods, Uni-ControlNet only requires the fine-tuning of two additional adapters upon frozen pre-trained text-to-image diffusion models.
Uni-ControlNet demonstrates its superiority over existing methods in terms of controllability, generation quality and composability.
arXiv Detail & Related papers (2023-05-25T17:59:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.