Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner
- URL: http://arxiv.org/abs/2310.09469v2
- Date: Wed, 01 Oct 2025 08:10:57 GMT
- Title: Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner
- Authors: Mengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Ran Yi, Deli Zhao, Wenping Wang, Yong-Jin Liu,
- Abstract summary: A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.<n>We propose a textbftimestep tuner that helps find a more accurate integral direction for a particular interval at the minimum cost.<n>Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
- Score: 112.99126045081046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. Existing acceleration algorithms simplify the sampling by skipping most steps yet exhibit considerable performance degradation. By viewing the generation of diffusion models as a discretized integral process, we argue that the quality drop is partly caused by applying an inaccurate integral direction to a timestep interval. To rectify this issue, we propose a \textbf{timestep tuner} that helps find a more accurate integral direction for a particular interval at the minimum cost. Specifically, at each denoising step, we replace the original parameterization by conditioning the network on a new timestep, enforcing the sampling distribution towards the real one. Extensive experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods, especially when there are few denoising steps. For example, when using 10 denoising steps on LSUN Bedroom dataset, we improve the FID of DDIM from 9.65 to 6.07, simply by adopting our method for a more appropriate set of timesteps. Code is available at \href{https://github.com/THU-LYJ-Lab/time-tuner}{https://github.com/THU-LYJ-Lab/time-tuner}.
Related papers
- LiteAttention: A Temporal Sparse Attention for Diffusion Transformers [1.3471268811218626]
LiteAttention exploits temporal coherence to enable evolutionary computation skips across the denoising sequence.<n>We implement a highly optimized LiteAttention kernel on top of FlashAttention and demonstrate substantial speedups on production video diffusion models.
arXiv Detail & Related papers (2025-11-14T08:26:55Z) - Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment [14.097906894386066]
PostDiff is a training-free framework for accelerating pre-trained diffusion models.<n>We show that PostDiff can significantly improve the fidelity-efficiency trade-off of state-of-the-art diffusion models.
arXiv Detail & Related papers (2025-08-08T09:29:37Z) - Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models [53.087070073434845]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face image quality degradation under a low-latency budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as ours), a novel ODE solver that mitigates truncation errors by incorporating multiple parallel gradient evaluations in each ODE step.
arXiv Detail & Related papers (2025-07-20T03:08:06Z) - AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse [19.13826316844611]
Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference.
We provide a theoretical understanding by analyzing the denoising process through the second-order Adams-Bashforth method.
We propose a novel caching-based acceleration approach for diffusion models, instead of directly reusing cached results.
arXiv Detail & Related papers (2025-04-13T08:29:58Z) - Efficient Diffusion Training through Parallelization with Truncated Karhunen-Loève Expansion [5.770347328961063]
Diffusion denoising models suffer from slow convergence during training.
We propose a novel forward-time process for training and sampling.
Our method significantly outperforms baseline diffusion models.
arXiv Detail & Related papers (2025-03-22T05:34:02Z) - Optimizing for the Shortest Path in Denoising Diffusion Model [8.884907787678731]
Shortest Path Diffusion Model (ShortDF) treats the denoising process as a shortest-path problem aimed at minimizing reconstruction error.
Experiments on multiple standard benchmarks demonstrate that ShortDF significantly reduces diffusion time (or steps)
This work, we suppose, paves the way for interactive diffusion-based applications and establishes a foundation for rapid data generation.
arXiv Detail & Related papers (2025-03-05T08:47:36Z) - Conditional GAN for Enhancing Diffusion Models in Efficient and Authentic Global Gesture Generation from Audios [10.57695963534794]
Methods based on VAEs are accompanied by issues of local jitter and global instability.
We introduce a conditional GAN to capture audio control signals and implicitly match the multimodal denoising distribution between the diffusion and denoising steps.
arXiv Detail & Related papers (2024-10-27T07:25:11Z) - Accelerating Diffusion Sampling with Optimized Time Steps [69.21208434350567]
Diffusion probabilistic models (DPMs) have shown remarkable performance in high-resolution image synthesis.
Their sampling efficiency is still to be desired due to the typically large number of sampling steps.
Recent advancements in high-order numerical ODE solvers for DPMs have enabled the generation of high-quality images with much fewer sampling steps.
arXiv Detail & Related papers (2024-02-27T10:13:30Z) - Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features.
We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps.
We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z) - Simultaneous Image-to-Zero and Zero-to-Noise: Diffusion Models with Analytical Image Attenuation [53.04220377034574]
We propose incorporating an analytical image attenuation process into the forward diffusion process for high-quality (un)conditioned image generation.
Our method represents the forward image-to-noise mapping as simultaneous textitimage-to-zero mapping and textitzero-to-noise mapping.
We have conducted experiments on unconditioned image generation, textite.g., CIFAR-10 and CelebA-HQ-256, and image-conditioned downstream tasks such as super-resolution, saliency detection, edge detection, and image inpainting.
arXiv Detail & Related papers (2023-06-23T18:08:00Z) - Parallel Sampling of Diffusion Models [76.3124029406809]
Diffusion models are powerful generative models but suffer from slow sampling.
We present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel.
arXiv Detail & Related papers (2023-05-25T17:59:42Z) - DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion [137.8749239614528]
We propose a new formulation of temporal action detection (TAD) with denoising diffusion, DiffTAD.
Taking as input random temporal proposals, it can yield action proposals accurately given an untrimmed long video.
arXiv Detail & Related papers (2023-03-27T00:40:52Z) - ProDiff: Progressive Fast Diffusion Model For High-Quality
Text-to-Speech [63.780196620966905]
We propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech.
ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling.
Our evaluation demonstrates that ProDiff needs only 2 iterations to synthesize high-fidelity mel-spectrograms.
ProDiff enables a sampling speed of 24x faster than real-time on a single NVIDIA 2080Ti GPU.
arXiv Detail & Related papers (2022-07-13T17:45:43Z) - Accelerating Diffusion Models via Early Stop of the Diffusion Process [114.48426684994179]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved impressive performance on various generation tasks.
In practice DDPMs often need hundreds even thousands of denoising steps to obtain a high-quality sample.
We propose a principled acceleration strategy, referred to as Early-Stopped DDPM (ES-DDPM), for DDPMs.
arXiv Detail & Related papers (2022-05-25T06:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.