BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion
- URL: http://arxiv.org/abs/2305.15798v3
- Date: Thu, 16 Nov 2023 08:13:06 GMT
- Title: BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion
- Authors: Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, Shinkook Choi
- Abstract summary: Text-to-image (T2I) generation with Stable Diffusion models (SDMs) involves high computing demands.
Recent studies have reduced sampling steps and applied network quantization while retaining the original architectures.
We uncover the surprising potential of block pruning and feature distillation for low-cost general-purpose T2I.
- Score: 3.1092085121563526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image (T2I) generation with Stable Diffusion models (SDMs) involves
high computing demands due to billion-scale parameters. To enhance efficiency,
recent studies have reduced sampling steps and applied network quantization
while retaining the original architectures. The lack of architectural reduction
attempts may stem from worries over expensive retraining for such massive
models. In this work, we uncover the surprising potential of block pruning and
feature distillation for low-cost general-purpose T2I. By removing several
residual and attention blocks from the U-Net of SDMs, we achieve 30%~50%
reduction in model size, MACs, and latency. We show that distillation
retraining is effective even under limited resources: using only 13 A100 days
and a tiny dataset, our compact models can imitate the original SDMs (v1.4 and
v2.1-base with over 6,000 A100 days). Benefiting from the transferred
knowledge, our BK-SDMs deliver competitive results on zero-shot MS-COCO against
larger multi-billion parameter models. We further demonstrate the applicability
of our lightweight backbones in personalized generation and image-to-image
translation. Deployment of our models on edge devices attains 4-second
inference. We hope this work can help build small yet powerful diffusion models
with feasible training budgets. Code and models can be found at:
https://github.com/Nota-NetsPresso/BK-SDM
Related papers
- LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights [2.8461446020965435]
We introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing Latent Diffusion Models.
We demonstrate the effectiveness of our approach on three different tasks: text-to-image (T2I) generation, Unconditional Image Generation (UIG) and Unconditional Audio Generation (UAG)
arXiv Detail & Related papers (2024-04-18T06:35:37Z) - SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions [5.100085108873068]
We present two models, SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS (30x faster than SD v1.5) and 30 FPS (60x faster than SDXL) on a single GPU.
Our training approach offers promising applications in image-conditioned control, facilitating efficient image-to-image translation.
arXiv Detail & Related papers (2024-03-25T11:16:23Z) - Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer
Level Loss [6.171638819257848]
Stable Diffusion XL (SDXL) has become the best open source text-to-image model (T2I) for its versatility and top-notch image quality.
Efficiently addressing the computational demands of SDXL models is crucial for wider reach and applicability.
We introduce two scaled-down variants, Segmind Stable Diffusion (SSD-1B) and Segmind-Vega, with 1.3B and 0.74B parameter UNets, respectively.
arXiv Detail & Related papers (2024-01-05T07:21:46Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis [52.42320594388199]
We present three key practices in building an efficient text-to-image model.
Based on these findings, we build two types of efficient text-to-image models, called KOALA-Turbo &-Lightning.
Unlike SDXL, our KOALA models can generate 1024px high-resolution images on consumer-grade GPUs with 8GB of VRAMs (3060Ti)
arXiv Detail & Related papers (2023-12-07T02:46:18Z) - DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture.
DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z) - ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models [59.90959789767886]
We show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions.
By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$times$64 and LSUN Cat 256$times$256 datasets.
arXiv Detail & Related papers (2023-11-23T16:49:06Z) - SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two
Seconds [88.06788636008051]
Text-to-image diffusion models can create stunning images from natural language descriptions that rival the work of professional artists and photographers.
These models are large, with complex network architectures and tens of denoising iterations, making them computationally expensive and slow to run.
We present a generic approach that unlocks running text-to-image diffusion models on mobile devices in less than $2$ seconds.
arXiv Detail & Related papers (2023-06-01T17:59:25Z) - Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead.
We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.