Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer
Level Loss
- URL: http://arxiv.org/abs/2401.02677v1
- Date: Fri, 5 Jan 2024 07:21:46 GMT
- Title: Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer
Level Loss
- Authors: Yatharth Gupta, Vishnu V. Jaddipal, Harish Prabhala, Sayak Paul and
Patrick Von Platen
- Abstract summary: Stable Diffusion XL (SDXL) has become the best open source text-to-image model (T2I) for its versatility and top-notch image quality.
Efficiently addressing the computational demands of SDXL models is crucial for wider reach and applicability.
We introduce two scaled-down variants, Segmind Stable Diffusion (SSD-1B) and Segmind-Vega, with 1.3B and 0.74B parameter UNets, respectively.
- Score: 6.171638819257848
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stable Diffusion XL (SDXL) has become the best open source text-to-image
model (T2I) for its versatility and top-notch image quality. Efficiently
addressing the computational demands of SDXL models is crucial for wider reach
and applicability. In this work, we introduce two scaled-down variants, Segmind
Stable Diffusion (SSD-1B) and Segmind-Vega, with 1.3B and 0.74B parameter
UNets, respectively, achieved through progressive removal using layer-level
losses focusing on reducing the model size while preserving generative quality.
We release these models weights at https://hf.co/Segmind. Our methodology
involves the elimination of residual networks and transformer blocks from the
U-Net structure of SDXL, resulting in significant reductions in parameters, and
latency. Our compact models effectively emulate the original SDXL by
capitalizing on transferred knowledge, achieving competitive results against
larger multi-billion parameter SDXL. Our work underscores the efficacy of
knowledge distillation coupled with layer-level losses in reducing model size
while preserving the high-quality generative capabilities of SDXL, thus
facilitating more accessible deployment in resource-constrained environments.
Related papers
- LinFusion: 1 GPU, 1 Minute, 16K Image [71.44735417472043]
We introduce a low-rank approximation of a wide spectrum of popular linear token mixers.
We find that the distilled model, termed LinFusion, achieves performance on par with or superior to the original SD.
Experiments on SD-v1.5, SD-v2.1, and SD-XL demonstrate that LinFusion enables satisfactory and efficient zero-shot cross-resolution generation.
arXiv Detail & Related papers (2024-09-03T17:54:39Z) - LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models [8.679634923220174]
We propose the layer pruning and normalized distillation for compressing diffusion models (LAPTOP-Diff)
Using the proposed LAPTOP-Diff, we compressed the U-Nets of SDXL and SDM-v1.5 for the most advanced performance, achieving a minimal 4.0% decline in PickScore at a pruning ratio of 50% while the comparative methods' minimal PickScore decline is 8.2%.
arXiv Detail & Related papers (2024-04-17T06:32:42Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis [52.42320594388199]
We present three key practices in building an efficient text-to-image model.
Based on these findings, we build two types of efficient text-to-image models, called KOALA-Turbo &-Lightning.
Unlike SDXL, our KOALA models can generate 1024px high-resolution images on consumer-grade GPUs with 8GB of VRAMs (3060Ti)
arXiv Detail & Related papers (2023-12-07T02:46:18Z) - DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture.
DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z) - ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models [59.90959789767886]
We show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions.
By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$times$64 and LSUN Cat 256$times$256 datasets.
arXiv Detail & Related papers (2023-11-23T16:49:06Z) - SDXL: Improving Latent Diffusion Models for High-Resolution Image
Synthesis [8.648456572970035]
We present SDXL, a latent diffusion model for text-to-image synthesis.
Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone.
We show drastically improved performance compared the previous versions of Stable Diffusion.
arXiv Detail & Related papers (2023-07-04T23:04:57Z) - BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion [3.1092085121563526]
Text-to-image (T2I) generation with Stable Diffusion models (SDMs) involves high computing demands.
Recent studies have reduced sampling steps and applied network quantization while retaining the original architectures.
We uncover the surprising potential of block pruning and feature distillation for low-cost general-purpose T2I.
arXiv Detail & Related papers (2023-05-25T07:28:28Z) - Towards Lightweight Super-Resolution with Dual Regression Learning [58.98801753555746]
Deep neural networks have exhibited remarkable performance in image super-resolution (SR) tasks.
The SR problem is typically an ill-posed problem and existing methods would come with several limitations.
We propose a dual regression learning scheme to reduce the space of possible SR mappings.
arXiv Detail & Related papers (2022-07-16T12:46:10Z) - Learning Robust and Lightweight Model through Separable Structured
Transformations [13.208781763887947]
We propose a separable structural transformation of the fully-connected layer to reduce the parameters of convolutional neural networks.
We successfully reduce the amount of network parameters by 90%, while the robust accuracy loss is less than 1.5%.
We evaluate the proposed approach on datasets such as ImageNet, SVHN, CIFAR-100 and Vision Transformer.
arXiv Detail & Related papers (2021-12-27T07:25:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.