RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers
- URL: http://arxiv.org/abs/2502.14377v1
- Date: Thu, 20 Feb 2025 09:10:05 GMT
- Title: RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers
- Authors: Ke Cao, Jing Wang, Ao Ma, Jiasong Feng, Zhanjie Zhang, Xuanhua He, Shanyuan Liu, Bo Cheng, Dawei Leng, Yuhui Yin, Jie Zhang,
- Abstract summary: We propose the Relevance-Guided Efficient Controllable Generation framework, RelaCtrl.
We evaluate the relevance of each layer in the Diffusion Transformer to the control information.
We then tailor the positioning, parameter scale, and modeling capacity of the control layers to reduce unnecessary parameters and redundant computations.
- Score: 11.003945673813488
- License:
- Abstract: The Diffusion Transformer plays a pivotal role in advancing text-to-image and text-to-video generation, owing primarily to its inherent scalability. However, existing controlled diffusion transformer methods incur significant parameter and computational overheads and suffer from inefficient resource allocation due to their failure to account for the varying relevance of control information across different transformer layers. To address this, we propose the Relevance-Guided Efficient Controllable Generation framework, RelaCtrl, enabling efficient and resource-optimized integration of control signals into the Diffusion Transformer. First, we evaluate the relevance of each layer in the Diffusion Transformer to the control information by assessing the "ControlNet Relevance Score"-i.e., the impact of skipping each control layer on both the quality of generation and the control effectiveness during inference. Based on the strength of the relevance, we then tailor the positioning, parameter scale, and modeling capacity of the control layers to reduce unnecessary parameters and redundant computations. Additionally, to further improve efficiency, we replace the self-attention and FFN in the commonly used copy block with the carefully designed Two-Dimensional Shuffle Mixer (TDSM), enabling efficient implementation of both the token mixer and channel mixer. Both qualitative and quantitative experimental results demonstrate that our approach achieves superior performance with only 15% of the parameters and computational complexity compared to PixArt-delta. More examples are available at https://relactrl.github.io/RelaCtrl/.
Related papers
- Shared DIFF Transformer [4.289692335378565]
DIFF Transformer improves attention allocation by enhancing focus on relevant context while suppressing noise.
We propose Shared DIFF Transformer, which draws on the idea of a differential amplifier by introducing a shared base matrix to model global patterns.
This design significantly reduces parameter redundancy, improves efficiency, and retains strong noise suppression capabilities.
arXiv Detail & Related papers (2025-01-29T09:29:07Z) - Adaptive Pruning of Pretrained Transformer via Differential Inclusions [48.47890215458465]
Current compression algorithms prune transformers at fixed compression ratios, requiring a unique pruning process for each ratio.
We propose pruning of pretrained transformers at any desired ratio within a single pruning stage, based on a differential inclusion for a mask parameter.
This dynamic can generate the whole regularization solution path of the mask parameter, whose support set identifies the network structure.
arXiv Detail & Related papers (2025-01-06T06:34:52Z) - TinyFusion: Diffusion Transformers Learned Shallow [52.96232442322824]
Diffusion Transformers have demonstrated remarkable capabilities in image generation but often come with excessive parameterization.
We present TinyFusion, a depth pruning method designed to remove redundant layers from diffusion transformers via end-to-end learning.
Experiments with DiT-XL show that TinyFusion can craft a shallow diffusion transformer at less than 7% of the pre-training cost, achieving a 2$times$ speedup with an FID score of 2.86.
arXiv Detail & Related papers (2024-12-02T07:05:39Z) - CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism.
We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies.
By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z) - OminiControl: Minimal and Universal Control for Diffusion Transformer [68.3243031301164]
OminiControl is a framework that integrates image conditions into pre-trained Diffusion Transformer (DiT) models.
At its core, OminiControl leverages a parameter reuse mechanism, enabling the DiT to encode image conditions using itself as a powerful backbone.
OminiControl addresses a wide range of image conditioning tasks in a unified manner, including subject-driven generation and spatially-aligned conditions.
arXiv Detail & Related papers (2024-11-22T17:55:15Z) - RepControlNet: ControlNet Reparameterization [0.562479170374811]
RepControlNet is proposed to realize controllable generation of diffusion models without increasing computation.
We have carried out a large number of experiments on both SD1.5 and SDXL, and the experimental results show the effectiveness and efficiency of the proposed RepControlNet.
arXiv Detail & Related papers (2024-08-17T16:21:51Z) - Function Approximation for Reinforcement Learning Controller for Energy from Spread Waves [69.9104427437916]
Multi-generator Wave Energy Converters (WEC) must handle multiple simultaneous waves coming from different directions called spread waves.
These complex devices need controllers with multiple objectives of energy capture efficiency, reduction of structural stress to limit maintenance, and proactive protection against high waves.
In this paper, we explore different function approximations for the policy and critic networks in modeling the sequential nature of the system dynamics.
arXiv Detail & Related papers (2024-04-17T02:04:10Z) - Controllable Text Generation with Residual Memory Transformer [4.9329649616940205]
We propose a non-intrusive, lightweight control plugin to accompany the generation of CLM at arbitrary time steps.
The proposed plugin, namely Residual Memory Transformer (RMT), has an encoder-decoder setup, which can accept any types of control conditions.
Extensive experiments are carried out on various control tasks, in the form of both automatic and human evaluations.
arXiv Detail & Related papers (2023-09-28T08:13:33Z) - Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution [32.29219284419944]
Cross-refinement adaptive feature modulation transformer (CRAFT)
We introduce a frequency-guided post-training quantization (PTQ) method aimed at enhancing CRAFT's efficiency.
Our experimental findings showcase CRAFT's superiority over current state-of-the-art methods.
arXiv Detail & Related papers (2023-08-09T15:38:36Z) - Rewiring the Transformer with Depth-Wise LSTMs [55.50278212605607]
We present a Transformer with depth-wise LSTMs connecting cascading Transformer layers and sub-layers.
Experiments with the 6-layer Transformer show significant BLEU improvements in both WMT 14 English-German / French tasks and the OPUS-100 many-to-many multilingual NMT task.
arXiv Detail & Related papers (2020-07-13T09:19:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.