A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization
- URL: http://arxiv.org/abs/2312.15516v3
- Date: Tue, 5 Mar 2024 03:20:12 GMT
- Title: A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization
- Authors: Jinchao Zhu, Yuxuan Wang, Xiaobing Tu, Siyuan Pan, Pengfei Wan, Gao
Huang
- Abstract summary: In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
- Score: 54.113083217869516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Stable Diffusion Model (SDM) is a popular and efficient text-to-image
(t2i) generation and image-to-image (i2i) generation model. Although there have
been some attempts to reduce sampling steps, model distillation, and network
quantization, these previous methods generally retain the original network
architecture. Billion scale parameters and high computing requirements make the
research of model architecture adjustment scarce. In this work, we first
explore the computational redundancy part of the network, and then prune the
redundancy blocks of the model and maintain the network performance through a
progressive incubation strategy. Secondly, in order to maintaining the model
performance, we add cross-layer multi-expert conditional convolution
(CLME-Condconv) to the block pruning part to inherit the original convolution
parameters. Thirdly, we propose a global-regional interactive (GRI) attention
to speed up the computationally intensive attention part. Finally, we use
semantic-aware supervision (SAS) to align the outputs of the teacher model and
student model at the semantic level. Experiments show that this method can
effectively train a lightweight model close to the performance of the original
SD model, and effectively improve the model speed under limited resources.
Experiments show that the proposed method can effectively train a light-weight
model close to the performance of the original SD model, and effectively
improve the model speed under limited resources. After acceleration, the UNet
part of the model is 22% faster and the overall speed is 19% faster.
Related papers
- Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies [51.7643024367548]
Stable Diffusion Model is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation.
This study focuses on reducing redundant computation in SDM and optimizing the model through both tuning and tuning-free methods.
arXiv Detail & Related papers (2024-05-31T21:47:05Z) - RL for Consistency Models: Faster Reward Guided Text-to-Image Generation [15.238373471473645]
We propose a framework for fine-tuning consistency models viaReinforcement Learning (RL)
Our framework, called Reinforcement Learning for Consistency Model (RLCM), frames the iterative inference process of a consistency model as an RL procedure.
Comparing to RL finetuned diffusion models, RLCM trains significantly faster, improves the quality of the generation measured under the reward objectives, and speeds up the inference procedure by generating high quality images with as few as two inference steps.
arXiv Detail & Related papers (2024-03-25T15:40:22Z) - SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions [5.100085108873068]
We present two models, SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS (30x faster than SD v1.5) and 30 FPS (60x faster than SDXL) on a single GPU.
Our training approach offers promising applications in image-conditioned control, facilitating efficient image-to-image translation.
arXiv Detail & Related papers (2024-03-25T11:16:23Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Fixed Point Diffusion Models [13.035518953879539]
Fixed Point Diffusion Model (FPDM) is a novel approach to image generation that integrates the concept of fixed point solving into the framework of diffusion-based generative modeling.
Our approach embeds an implicit fixed point solving layer into the denoising network of a diffusion model, transforming the diffusion process into a sequence of closely-related fixed point problems.
We conduct experiments with state-of-the-art models on ImageNet, FFHQ, CelebA-HQ, and LSUN-Church, demonstrating substantial improvements in performance and efficiency.
arXiv Detail & Related papers (2024-01-16T18:55:54Z) - AutoDiffusion: Training-Free Optimization of Time Steps and
Architectures for Automated Diffusion Model Acceleration [57.846038404893626]
We propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training.
Experimental results show that our method achieves excellent performance by using only a few time steps, e.g. 17.86 FID score on ImageNet 64 $times$ 64 with only four steps, compared to 138.66 with DDIM.
arXiv Detail & Related papers (2023-09-19T08:57:24Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - A Two-step-training Deep Learning Framework for Real-time Computational
Imaging without Physics Priors [0.0]
We propose a two-step-training DL (TST-DL) framework for real-time computational imaging without physics priors.
First, a single fully-connected layer (FCL) is trained to directly learn the model.
Then, this FCL is fixed and fixed with an un-trained U-Net architecture for a second-step training to improve the output image fidelity.
arXiv Detail & Related papers (2020-01-10T15:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.