Towards More Accurate Diffusion Model Acceleration with A Timestep
Aligner
- URL: http://arxiv.org/abs/2310.09469v1
- Date: Sat, 14 Oct 2023 02:19:07 GMT
- Title: Towards More Accurate Diffusion Model Acceleration with A Timestep
Aligner
- Authors: Mengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Ran Yi, Deli Zhao,
Wenping Wang, Yong-jin Liu
- Abstract summary: A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.
We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost.
Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
- Score: 84.97253871387028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A diffusion model, which is formulated to produce an image using thousands of
denoising steps, usually suffers from a slow inference speed. Existing
acceleration algorithms simplify the sampling by skipping most steps yet
exhibit considerable performance degradation. By viewing the generation of
diffusion models as a discretized integrating process, we argue that the
quality drop is partly caused by applying an inaccurate integral direction to a
timestep interval. To rectify this issue, we propose a timestep aligner that
helps find a more accurate integral direction for a particular interval at the
minimum cost. Specifically, at each denoising step, we replace the original
parameterization by conditioning the network on a new timestep, which is
obtained by aligning the sampling distribution to the real distribution.
Extensive experiments show that our plug-in design can be trained efficiently
and boost the inference performance of various state-of-the-art acceleration
methods, especially when there are few denoising steps. For example, when using
10 denoising steps on the popular LSUN Bedroom dataset, we improve the FID of
DDIM from 9.65 to 6.07, simply by adopting our method for a more appropriate
set of timesteps. Code will be made publicly available.
Related papers
- Conditional GAN for Enhancing Diffusion Models in Efficient and Authentic Global Gesture Generation from Audios [10.57695963534794]
Methods based on VAEs are accompanied by issues of local jitter and global instability.
We introduce a conditional GAN to capture audio control signals and implicitly match the multimodal denoising distribution between the diffusion and denoising steps.
arXiv Detail & Related papers (2024-10-27T07:25:11Z) - Accelerating Diffusion Sampling with Optimized Time Steps [69.21208434350567]
Diffusion probabilistic models (DPMs) have shown remarkable performance in high-resolution image synthesis.
Their sampling efficiency is still to be desired due to the typically large number of sampling steps.
Recent advancements in high-order numerical ODE solvers for DPMs have enabled the generation of high-quality images with much fewer sampling steps.
arXiv Detail & Related papers (2024-02-27T10:13:30Z) - Parallel Sampling of Diffusion Models [76.3124029406809]
Diffusion models are powerful generative models but suffer from slow sampling.
We present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel.
arXiv Detail & Related papers (2023-05-25T17:59:42Z) - DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion [137.8749239614528]
We propose a new formulation of temporal action detection (TAD) with denoising diffusion, DiffTAD.
Taking as input random temporal proposals, it can yield action proposals accurately given an untrimmed long video.
arXiv Detail & Related papers (2023-03-27T00:40:52Z) - ProDiff: Progressive Fast Diffusion Model For High-Quality
Text-to-Speech [63.780196620966905]
We propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech.
ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling.
Our evaluation demonstrates that ProDiff needs only 2 iterations to synthesize high-fidelity mel-spectrograms.
ProDiff enables a sampling speed of 24x faster than real-time on a single NVIDIA 2080Ti GPU.
arXiv Detail & Related papers (2022-07-13T17:45:43Z) - Accelerating Diffusion Models via Early Stop of the Diffusion Process [114.48426684994179]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved impressive performance on various generation tasks.
In practice DDPMs often need hundreds even thousands of denoising steps to obtain a high-quality sample.
We propose a principled acceleration strategy, referred to as Early-Stopped DDPM (ES-DDPM), for DDPMs.
arXiv Detail & Related papers (2022-05-25T06:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.