A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization
- URL: http://arxiv.org/abs/2312.15516v3
- Date: Tue, 5 Mar 2024 03:20:12 GMT
- Title: A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization
- Authors: Jinchao Zhu, Yuxuan Wang, Xiaobing Tu, Siyuan Pan, Pengfei Wan, Gao
Huang
- Abstract summary: In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
- Score: 54.113083217869516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Stable Diffusion Model (SDM) is a popular and efficient text-to-image
(t2i) generation and image-to-image (i2i) generation model. Although there have
been some attempts to reduce sampling steps, model distillation, and network
quantization, these previous methods generally retain the original network
architecture. Billion scale parameters and high computing requirements make the
research of model architecture adjustment scarce. In this work, we first
explore the computational redundancy part of the network, and then prune the
redundancy blocks of the model and maintain the network performance through a
progressive incubation strategy. Secondly, in order to maintaining the model
performance, we add cross-layer multi-expert conditional convolution
(CLME-Condconv) to the block pruning part to inherit the original convolution
parameters. Thirdly, we propose a global-regional interactive (GRI) attention
to speed up the computationally intensive attention part. Finally, we use
semantic-aware supervision (SAS) to align the outputs of the teacher model and
student model at the semantic level. Experiments show that this method can
effectively train a lightweight model close to the performance of the original
SD model, and effectively improve the model speed under limited resources.
Experiments show that the proposed method can effectively train a light-weight
model close to the performance of the original SD model, and effectively
improve the model speed under limited resources. After acceleration, the UNet
part of the model is 22% faster and the overall speed is 19% faster.
Related papers
- M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation [39.97174784206976]
We show that this scale-wise autoregressive framework can be effectively decoupled into textitintra-scale modeling
We apply linear-complexity mechanisms like Mamba to substantially reduce computational overhead.
Experiments demonstrate that our method outperforms existing models in both image quality and generation speed.
arXiv Detail & Related papers (2024-11-15T18:54:42Z) - RedTest: Towards Measuring Redundancy in Deep Neural Networks Effectively [10.812755570974929]
We use Model Structural Redundancy Score (MSRS) to measure the degree of redundancy in a deep learning model structure.
MSRS is effective in both revealing and assessing the redundancy issues in many state-of-the-art models.
We design a novel redundancy-aware algorithm to guide the search for the optimal model structure.
arXiv Detail & Related papers (2024-11-15T14:36:07Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies [51.7643024367548]
Stable Diffusion Model is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation.
This study focuses on reducing redundant computation in SDM and optimizing the model through both tuning and tuning-free methods.
arXiv Detail & Related papers (2024-05-31T21:47:05Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Fixed Point Diffusion Models [13.035518953879539]
Fixed Point Diffusion Model (FPDM) is a novel approach to image generation that integrates the concept of fixed point solving into the framework of diffusion-based generative modeling.
Our approach embeds an implicit fixed point solving layer into the denoising network of a diffusion model, transforming the diffusion process into a sequence of closely-related fixed point problems.
We conduct experiments with state-of-the-art models on ImageNet, FFHQ, CelebA-HQ, and LSUN-Church, demonstrating substantial improvements in performance and efficiency.
arXiv Detail & Related papers (2024-01-16T18:55:54Z) - AutoDiffusion: Training-Free Optimization of Time Steps and
Architectures for Automated Diffusion Model Acceleration [57.846038404893626]
We propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training.
Experimental results show that our method achieves excellent performance by using only a few time steps, e.g. 17.86 FID score on ImageNet 64 $times$ 64 with only four steps, compared to 138.66 with DDIM.
arXiv Detail & Related papers (2023-09-19T08:57:24Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - A Two-step-training Deep Learning Framework for Real-time Computational
Imaging without Physics Priors [0.0]
We propose a two-step-training DL (TST-DL) framework for real-time computational imaging without physics priors.
First, a single fully-connected layer (FCL) is trained to directly learn the model.
Then, this FCL is fixed and fixed with an un-trained U-Net architecture for a second-step training to improve the output image fidelity.
arXiv Detail & Related papers (2020-01-10T15:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.