OminiControl2: Efficient Conditioning for Diffusion Transformers
- URL: http://arxiv.org/abs/2503.08280v1
- Date: Tue, 11 Mar 2025 10:50:14 GMT
- Title: OminiControl2: Efficient Conditioning for Diffusion Transformers
- Authors: Zhenxiong Tan, Qiaochu Xue, Xingyi Yang, Songhua Liu, Xinchao Wang,
- Abstract summary: We present OminiControl2, an efficient framework that achieves efficient image-conditional image generation.<n>OminiControl2 introduces two key innovations: (1) a dynamic compression strategy that streamlines conditional inputs by preserving only the most semantically relevant tokens during generation, and (2) a conditional feature reuse mechanism that computes condition token features only once and reuses them across denoising steps.
- Score: 68.3243031301164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-grained control of text-to-image diffusion transformer models (DiT) remains a critical challenge for practical deployment. While recent advances such as OminiControl and others have enabled a controllable generation of diverse control signals, these methods face significant computational inefficiency when handling long conditional inputs. We present OminiControl2, an efficient framework that achieves efficient image-conditional image generation. OminiControl2 introduces two key innovations: (1) a dynamic compression strategy that streamlines conditional inputs by preserving only the most semantically relevant tokens during generation, and (2) a conditional feature reuse mechanism that computes condition token features only once and reuses them across denoising steps. These architectural improvements preserve the original framework's parameter efficiency and multi-modal versatility while dramatically reducing computational costs. Our experiments demonstrate that OminiControl2 reduces conditional processing overhead by over 90% compared to its predecessor, achieving an overall 5.9$\times$ speedup in multi-conditional generation scenarios. This efficiency enables the practical implementation of complex, multi-modal control for high-quality image synthesis with DiT models.
Related papers
- EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer [15.879712910520801]
We propose EasyControl, a novel framework designed to unify condition-guided diffusion transformers with high efficiency and flexibility.
Our framework is built on three key innovations. First, we introduce a lightweight Condition Injection LoRA Module.
Second, we propose a Position-Aware Training Paradigm. This approach standardizes input conditions to fixed resolutions, allowing the generation of images with arbitrary aspect ratios and flexible resolutions.
Third, we develop a Causal Attention Mechanism combined with the KV Cache technique, adapted for conditional generation tasks.
arXiv Detail & Related papers (2025-03-10T08:07:17Z) - Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers [55.87192133758051]
Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation quality but suffer from high latency and memory inefficiency.
We propose DiffCR, a dynamic DiT inference framework with differentiable compression ratios.
arXiv Detail & Related papers (2024-12-22T02:04:17Z) - E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling [17.62612090885471]
ECAR (Efficient Continuous Auto-Regressive Image Generation via Multistage Modeling) is presented.
It operates by generating tokens at increasing resolutions while simultaneously denoising the image at each stage.
ECAR achieves comparable image quality to DiT Peebles & Xie [2023] while requiring 10$times$ FLOPs reduction and 5$times$ speedup to generate a 256$times $256 image.
arXiv Detail & Related papers (2024-12-18T18:59:53Z) - DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation [63.63429658282696]
We propose DynamicControl, which supports dynamic combinations of diverse control signals.
We show that DynamicControl is superior to existing methods in terms of controllability, generation quality and composability under various conditional controls.
arXiv Detail & Related papers (2024-12-04T11:54:57Z) - OminiControl: Minimal and Universal Control for Diffusion Transformer [68.3243031301164]
We present OminiControl, a novel approach that rethinks how image conditions are integrated into Diffusion Transformer (DiT) architectures.<n>OminiControl addresses these limitations through three key innovations.
arXiv Detail & Related papers (2024-11-22T17:55:15Z) - OmniControlNet: Dual-stage Integration for Conditional Image Generation [61.1432268643639]
We provide a two-way integration for the widely adopted ControlNet by integrating external condition generation algorithms into a single dense prediction method.
Our proposed OmniControlNet consolidates 1) the condition generation by a single multi-tasking dense prediction algorithm under the task embedding guidance and 2) the image generation process for different conditioning types under the textual embedding guidance.
arXiv Detail & Related papers (2024-06-09T18:03:47Z) - FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation [99.4649330193233]
Controllable text-to-image (T2I) diffusion models generate images conditioned on both text prompts and semantic inputs of other modalities like edge maps.
We propose a novel Flexible and Efficient method, FlexEControl, for controllable T2I generation.
arXiv Detail & Related papers (2024-05-08T06:09:11Z) - Prompt Guided Transformer for Multi-Task Dense Prediction [14.815576352301322]
We introduce a lightweight task-conditional model called Prompt Guided Transformer to optimize performance and model parameters.
Our approach achieves state-of-the-art results among task-conditional methods while using fewer parameters, and maintains a significant balance between performance and parameter size.
arXiv Detail & Related papers (2023-07-28T07:25:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.