Related papers: SpeedUpNet: A Plug-and-Play Hyper-Network for Accelerating Text-to-Image Diffusion Models

SpeedUpNet: A Plug-and-Play Hyper-Network for Accelerating Text-to-Image Diffusion Models

URL: http://arxiv.org/abs/2312.08887v3
Date: Wed, 20 Dec 2023 08:02:09 GMT
Title: SpeedUpNet: A Plug-and-Play Hyper-Network for Accelerating Text-to-Image Diffusion Models
Authors: Weilong Chai, DanDan Zheng, Jiajiong Cao, Zhiquan Chen, Changbao Wang, Chenguang Ma
Abstract summary: We propose a novel Stable-Diffusion (SD) acceleration module called SpeedUpNet(SUN) SUN can be directly plugged into various fine-tuned SD models without extra training. SUN significantly reduces the number of inference steps to just 4 steps and eliminates the need for classifier-free guidance.
Score: 4.484567783139048
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Text-to-image diffusion models (SD) exhibit significant advancements while requiring extensive computational resources. Though many acceleration methods have been proposed, they suffer from generation quality degradation or extra training cost generalizing to new fine-tuned models. To address these limitations, we propose a novel and universal Stable-Diffusion (SD) acceleration module called SpeedUpNet(SUN). SUN can be directly plugged into various fine-tuned SD models without extra training. This technique utilizes cross-attention layers to learn the relative offsets in the generated image results between negative and positive prompts achieving classifier-free guidance distillation with negative prompts controllable, and introduces a Multi-Step Consistency (MSC) loss to ensure a harmonious balance between reducing inference steps and maintaining consistency in the generated output. Consequently, SUN significantly reduces the number of inference steps to just 4 steps and eliminates the need for classifier-free guidance. It leads to an overall speedup of more than 10 times for SD models compared to the state-of-the-art 25-step DPM-solver++, and offers two extra advantages: (1) classifier-free guidance distillation with controllable negative prompts and (2) seamless integration into various fine-tuned Stable-Diffusion models without training. The effectiveness of the SUN has been verified through extensive experimentation. Project Page: https://williechai.github.io/speedup-plugin-for-stable-diffusions.github.io

Related papers

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference [58.06027151683975]
We present Seed Diffusion Preview, a large-scale language model based on discrete-state diffusion, offering remarkably fast inference speed.<n>Thanks to non-sequential, parallel generation, discrete diffusion models provide a notable speedup to mitigate the inherent latency of token-by-token decoding.
arXiv Detail & Related papers (2025-08-04T08:43:01Z)
CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers [72.23291099555459]
Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures.<n>This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism.<n>ChoRDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation.
arXiv Detail & Related papers (2025-07-21T05:48:47Z)
Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models [14.618774364317053]
We present Morse, a dual-sampling framework for accelerating diffusion models losslessly.<n>Specifically, Morse involves two models called Dash and Dot that interact with each other.<n>By chaining the outputs of the Dash and Dot models run in a time-interleaved fashion, Morse exhibits the merit of flexibly attaining desired image generation performance.
arXiv Detail & Related papers (2025-06-23T02:43:21Z)
FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge [60.000984252907195]
Auto-regressive (AR) models have recently shown promise in visual generation tasks due to their superior sampling efficiency.<n>Video generation requires a substantially larger number of tokens to produce coherent temporal frames, resulting in significant overhead during the decoding phase.<n>We propose the textbfFastCar framework to accelerate the decode phase for the AR video generation by exploring the temporal redundancy.
arXiv Detail & Related papers (2025-05-17T05:00:39Z)
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling [53.33281984430122]
Diffusion Transformer (DiT) is a promising diffusion model for visual generation but incurs significant computational overhead.<n>In this paper, we revisit convolution as an alternative building block for constructing efficient and expressive diffusion models.<n>We introduce Diffusion ConvNet (DiCo), a family of diffusion models built entirely from standard ConvNet modules.
arXiv Detail & Related papers (2025-05-16T12:54:04Z)
FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability. We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF) PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models. FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z)
Open-Source Acceleration of Stable-Diffusion.cpp Deployable on All Devices [28.774856591172902]
stable-diffusion.Turbo (Sd) emerges as an efficient inference framework to accelerate the diffusion models. In this work, we present an optimized version of Sd leveraging the Winograd algorithm to accelerate 2D convolution operations. We demonstrate a speedup up to 2.76x for individual convolutional layers and an inference speedup up to 4.79x for the overall image generation process.
arXiv Detail & Related papers (2024-12-08T02:27:17Z)
LinFusion: 1 GPU, 1 Minute, 16K Image [71.44735417472043]
We introduce a low-rank approximation of a wide spectrum of popular linear token mixers. We find that the distilled model, termed LinFusion, achieves performance on par with or superior to the original SD. Experiments on SD-v1.5, SD-v2.1, and SD-XL demonstrate that LinFusion enables satisfactory and efficient zero-shot cross-resolution generation.
arXiv Detail & Related papers (2024-09-03T17:54:39Z)
SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules [9.361932702480361]
We present SwiftDiffusion, a system that efficiently generates high-quality images using stable diffusion models and add-on modules. Compared to state-of-the-art text-to-image serving systems, SwiftDiffusion reduces serving latency by up to 5x and improves serving throughput by up to 2x without compromising image quality.
arXiv Detail & Related papers (2024-07-02T07:59:08Z)
FORA: Fast-Forward Caching in Diffusion Transformer Acceleration [39.51519525071639]
Diffusion transformers (DiT) have become the de facto choice for generating high-quality images and videos. Fast-FORward CAching (FORA) is designed to accelerate DiT by exploiting the repetitive nature of the diffusion process.
arXiv Detail & Related papers (2024-07-01T16:14:37Z)
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising [49.785626309848276]
AsyncDiff is a universal and plug-and-play acceleration scheme that enables model parallelism across multiple devices. For the Stable Diffusion v2.1, AsyncDiff achieves a 2.7x speedup with negligible degradation and a 4.0x speedup with only a slight reduction of 0.38 in CLIP Score. Our experiments also demonstrate that AsyncDiff can be readily applied to video diffusion models with encouraging performances.
arXiv Detail & Related papers (2024-06-11T03:09:37Z)
Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs) Existing binarization methods result in significant performance degradation. We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z)
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference [23.49242865222089]
This paper introduces DeepSpeed-FastGen, a system that delivers up to 2.3x higher effective throughput, 2x lower latency on average, and up to 3.7x lower (token-level) tail latency. We leverage a synergistic combination of DeepSpeed-MII and DeepSpeed-Inference to provide an efficient and easy-to-use serving system for large language models.
arXiv Detail & Related papers (2024-01-09T06:49:40Z)
A-SDM: Accelerating Stable Diffusion through Redundancy Removal and Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network. We then prune the redundancy blocks of the model and maintain the network performance. Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z)
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module [52.8517132452467]
Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks. This report further extends LCMs' potential by applying LoRA distillation to larger Stable-Diffusion models. We identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA.
arXiv Detail & Related papers (2023-11-09T18:04:15Z)
Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation [115.09597127418452]
Latent-Shift is an efficient text-to-video generation method based on a pretrained text-to-image generation model. We show that Latent-Shift achieves comparable or better results while being significantly more efficient.
arXiv Detail & Related papers (2023-04-17T17:57:06Z)
Wavelet Diffusion Models are fast and scalable Image Generators [3.222802562733787]
Diffusion models are a powerful solution for high-fidelity image generation, which exceeds GANs in quality in many circumstances. Recent DiffusionGAN method significantly decreases the models' running time by reducing the number of sampling steps from thousands to several, but their speeds still largely lag behind the GAN counterparts. This paper aims to reduce the speed gap by proposing a novel wavelet-based diffusion scheme. We extract low-and-high frequency components from both image and feature levels via wavelet decomposition and adaptively handle these components for faster processing while maintaining good generation quality.
arXiv Detail & Related papers (2022-11-29T12:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.