Related papers: CCM: Adding Conditional Controls to Text-to-Image Consistency Models

CCM: Adding Conditional Controls to Text-to-Image Consistency Models

URL: http://arxiv.org/abs/2312.06971v1
Date: Tue, 12 Dec 2023 04:16:03 GMT
Title: CCM: Adding Conditional Controls to Text-to-Image Consistency Models
Authors: Jie Xiao, Kai Zhu, Han Zhang, Zhiheng Liu, Yujun Shen, Yu Liu, Xueyang Fu, Zheng-Jun Zha
Abstract summary: We consider alternative strategies for adding ControlNet-like conditional control to Consistency Models. A lightweight adapter can be jointly optimized under multiple conditions through Consistency Training. We study these three solutions across various conditional controls, including edge, depth, human pose, low-resolution image and masked image.
Score: 89.75377958996305
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality. However, the way to add new conditional controls to the pretrained CMs has not been explored. In this technical report, we consider alternative strategies for adding ControlNet-like conditional control to CMs and present three significant findings. 1) ControlNet trained for diffusion models (DMs) can be directly applied to CMs for high-level semantic controls but struggles with low-level detail and realism control. 2) CMs serve as an independent class of generative models, based on which ControlNet can be trained from scratch using Consistency Training proposed by Song et al. 3) A lightweight adapter can be jointly optimized under multiple conditions through Consistency Training, allowing for the swift transfer of DMs-based ControlNet to CMs. We study these three solutions across various conditional controls, including edge, depth, human pose, low-resolution image and masked image with text-to-image latent consistency models.

Related papers

DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models [55.42794740244581]
We introduce DC (Decouple)-ControlNet, a framework for multi-condition image generation. The core idea behind DC-ControlNet is to decouple control conditions, transforming global control into a hierarchical system. For interactions between elements, we introduce the Inter-Element Controller, which accurately handles multi-element interactions.
arXiv Detail & Related papers (2025-02-20T18:01:02Z)
C3: Learning Congestion Controllers with Formal Certificates [14.750230453127413]
C3 is a new learning framework for congestion control that integrates the concept of formal certification in the learning loop. C3-trained controllers provide both adaptability and worst-case reliability across a range of network conditions.
arXiv Detail & Related papers (2024-12-14T18:02:50Z)
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation [63.63429658282696]
We propose DynamicControl, which supports dynamic combinations of diverse control signals. We show that DynamicControl is superior to existing methods in terms of controllability, generation quality and composability under various conditional controls.
arXiv Detail & Related papers (2024-12-04T11:54:57Z)
ControlVAR: Exploring Controllable Visual Autoregressive Modeling [48.66209303617063]
Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs) Challenges such as expensive computational cost, high inference latency, and difficulties of integration with large language models (LLMs) have necessitated exploring alternatives to DMs. This paper introduces Controlmore, a novel framework that explores pixel-level controls in visual autoregressive modeling for flexible and efficient conditional generation.
arXiv Detail & Related papers (2024-06-14T06:35:33Z)
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback [20.910939141948123]
ControlNet++ is a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls. It achieves improvements over ControlNet by 11.1% mIoU, 13.4% SSIM, and 7.6% RMSE, respectively, for segmentation mask, line-art edge, and depth conditions.
arXiv Detail & Related papers (2024-04-11T17:59:09Z)
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition [41.92032568474062]
FreeControl is a training-free approach for controllable T2I generation. It supports multiple conditions, architectures, and checkpoints simultaneously. It achieves competitive synthesis quality with training-based approaches.
arXiv Detail & Related papers (2023-12-12T18:59:14Z)
Fine-grained Controllable Video Generation via Object Appearance and Context [74.23066823064575]
We propose fine-grained controllable video generation (FACTOR) to achieve detailed control. FACTOR aims to control objects' appearances and context, including their location and category. Our method achieves controllability of object appearances without finetuning, which reduces the per-subject optimization efforts for the users.
arXiv Detail & Related papers (2023-12-05T17:47:33Z)
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning [115.50132185963139]
CM3Leon is a decoder-only multi-modal language model capable of generating and infilling both text and images. It is the first multi-modal model trained with a recipe adapted from text-only language models. CM3Leon achieves state-of-the-art performance in text-to-image generation with 5x less training compute than comparable methods.
arXiv Detail & Related papers (2023-09-05T21:27:27Z)
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models [82.19740045010435]
We introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls and global controls. Unlike existing methods, Uni-ControlNet only requires the fine-tuning of two additional adapters upon frozen pre-trained text-to-image diffusion models. Uni-ControlNet demonstrates its superiority over existing methods in terms of controllability, generation quality and composability.
arXiv Detail & Related papers (2023-05-25T17:59:58Z)
Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives [89.34229413345541]
We propose a conditioning scheme which avoids pitfalls by learning the controller and its conditioning in an end-to-end manner. Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion. We report significant improvements in task success over representative MPC and IL baselines.
arXiv Detail & Related papers (2020-03-19T15:04:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.