SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2312.08887v4
- Date: Tue, 01 Oct 2024 08:30:05 GMT
- Title: SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models
- Authors: Weilong Chai, DanDan Zheng, Jiajiong Cao, Zhiquan Chen, Changbao Wang, Chenguang Ma,
- Abstract summary: Text-to-image diffusion models (SD) exhibit significant advancements while requiring extensive computational resources.
LCM-LoRA, trainable once for diverse models, offers universality but rarely considers ensuring the consistency of generated content before and after acceleration.
This paper proposes SpeedUpNet (SUN), an innovative acceleration module, to address the challenges of universality and consistency.
- Score: 4.232258232565518
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Text-to-image diffusion models (SD) exhibit significant advancements while requiring extensive computational resources. Existing acceleration methods usually require extensive training and are not universally applicable. LCM-LoRA, trainable once for diverse models, offers universality but rarely considers ensuring the consistency of generated content before and after acceleration. This paper proposes SpeedUpNet (SUN), an innovative acceleration module, to address the challenges of universality and consistency. Exploiting the role of cross-attention layers in U-Net for SD models, we introduce an adapter specifically designed for these layers, quantifying the offset in image generation caused by negative prompts relative to positive prompts. This learned offset demonstrates stability across a range of models, enhancing SUN's universality. To improve output consistency, we propose a Multi-Step Consistency (MSC) loss, which stabilizes the offset and ensures fidelity in accelerated content. Experiments on SD v1.5 show that SUN leads to an overall speedup of more than 10 times compared to the baseline 25-step DPM-solver++, and offers two extra advantages: (1) training-free integration into various fine-tuned Stable-Diffusion models and (2) state-of-the-art FIDs of the generated data set before and after acceleration guided by random combinations of positive and negative prompts. Code is available: https://williechai.github.io/speedup-plugin-for-stable-diffusions.github.io.
Related papers
- Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference [58.06027151683975]
We present Seed Diffusion Preview, a large-scale language model based on discrete-state diffusion, offering remarkably fast inference speed.<n>Thanks to non-sequential, parallel generation, discrete diffusion models provide a notable speedup to mitigate the inherent latency of token-by-token decoding.
arXiv Detail & Related papers (2025-08-04T08:43:01Z) - CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers [72.23291099555459]
Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures.<n>This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism.<n>ChoRDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation.
arXiv Detail & Related papers (2025-07-21T05:48:47Z) - Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models [14.618774364317053]
We present Morse, a dual-sampling framework for accelerating diffusion models losslessly.<n>Specifically, Morse involves two models called Dash and Dot that interact with each other.<n>By chaining the outputs of the Dash and Dot models run in a time-interleaved fashion, Morse exhibits the merit of flexibly attaining desired image generation performance.
arXiv Detail & Related papers (2025-06-23T02:43:21Z) - FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge [60.000984252907195]
Auto-regressive (AR) models have recently shown promise in visual generation tasks due to their superior sampling efficiency.<n>Video generation requires a substantially larger number of tokens to produce coherent temporal frames, resulting in significant overhead during the decoding phase.<n>We propose the textbfFastCar framework to accelerate the decode phase for the AR video generation by exploring the temporal redundancy.
arXiv Detail & Related papers (2025-05-17T05:00:39Z) - DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling [53.33281984430122]
Diffusion Transformer (DiT) is a promising diffusion model for visual generation but incurs significant computational overhead.<n>In this paper, we revisit convolution as an alternative building block for constructing efficient and expressive diffusion models.<n>We introduce Diffusion ConvNet (DiCo), a family of diffusion models built entirely from standard ConvNet modules.
arXiv Detail & Related papers (2025-05-16T12:54:04Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.
We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)
PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.
FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - Open-Source Acceleration of Stable-Diffusion.cpp Deployable on All Devices [28.774856591172902]
stable-diffusion.Turbo (Sd) emerges as an efficient inference framework to accelerate the diffusion models.
In this work, we present an optimized version of Sd leveraging the Winograd algorithm to accelerate 2D convolution operations.
We demonstrate a speedup up to 2.76x for individual convolutional layers and an inference speedup up to 4.79x for the overall image generation process.
arXiv Detail & Related papers (2024-12-08T02:27:17Z) - LinFusion: 1 GPU, 1 Minute, 16K Image [71.44735417472043]
We introduce a low-rank approximation of a wide spectrum of popular linear token mixers.
We find that the distilled model, termed LinFusion, achieves performance on par with or superior to the original SD.
Experiments on SD-v1.5, SD-v2.1, and SD-XL demonstrate that LinFusion enables satisfactory and efficient zero-shot cross-resolution generation.
arXiv Detail & Related papers (2024-09-03T17:54:39Z) - SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules [9.361932702480361]
We present SwiftDiffusion, a system that efficiently generates high-quality images using stable diffusion models and add-on modules.
Compared to state-of-the-art text-to-image serving systems, SwiftDiffusion reduces serving latency by up to 5x and improves serving throughput by up to 2x without compromising image quality.
arXiv Detail & Related papers (2024-07-02T07:59:08Z) - FORA: Fast-Forward Caching in Diffusion Transformer Acceleration [39.51519525071639]
Diffusion transformers (DiT) have become the de facto choice for generating high-quality images and videos.
Fast-FORward CAching (FORA) is designed to accelerate DiT by exploiting the repetitive nature of the diffusion process.
arXiv Detail & Related papers (2024-07-01T16:14:37Z) - AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising [49.785626309848276]
AsyncDiff is a universal and plug-and-play acceleration scheme that enables model parallelism across multiple devices.
For the Stable Diffusion v2.1, AsyncDiff achieves a 2.7x speedup with negligible degradation and a 4.0x speedup with only a slight reduction of 0.38 in CLIP Score.
Our experiments also demonstrate that AsyncDiff can be readily applied to video diffusion models with encouraging performances.
arXiv Detail & Related papers (2024-06-11T03:09:37Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and
DeepSpeed-Inference [23.49242865222089]
This paper introduces DeepSpeed-FastGen, a system that delivers up to 2.3x higher effective throughput, 2x lower latency on average, and up to 3.7x lower (token-level) tail latency.
We leverage a synergistic combination of DeepSpeed-MII and DeepSpeed-Inference to provide an efficient and easy-to-use serving system for large language models.
arXiv Detail & Related papers (2024-01-09T06:49:40Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - LCM-LoRA: A Universal Stable-Diffusion Acceleration Module [52.8517132452467]
Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks.
This report further extends LCMs' potential by applying LoRA distillation to larger Stable-Diffusion models.
We identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA.
arXiv Detail & Related papers (2023-11-09T18:04:15Z) - Latent-Shift: Latent Diffusion with Temporal Shift for Efficient
Text-to-Video Generation [115.09597127418452]
Latent-Shift is an efficient text-to-video generation method based on a pretrained text-to-image generation model.
We show that Latent-Shift achieves comparable or better results while being significantly more efficient.
arXiv Detail & Related papers (2023-04-17T17:57:06Z) - Wavelet Diffusion Models are fast and scalable Image Generators [3.222802562733787]
Diffusion models are a powerful solution for high-fidelity image generation, which exceeds GANs in quality in many circumstances.
Recent DiffusionGAN method significantly decreases the models' running time by reducing the number of sampling steps from thousands to several, but their speeds still largely lag behind the GAN counterparts.
This paper aims to reduce the speed gap by proposing a novel wavelet-based diffusion scheme.
We extract low-and-high frequency components from both image and feature levels via wavelet decomposition and adaptively handle these components for faster processing while maintaining good generation quality.
arXiv Detail & Related papers (2022-11-29T12:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.