Wavelet Diffusion Models are fast and scalable Image Generators
- URL: http://arxiv.org/abs/2211.16152v2
- Date: Wed, 22 Mar 2023 19:11:25 GMT
- Title: Wavelet Diffusion Models are fast and scalable Image Generators
- Authors: Hao Phung, Quan Dao, Anh Tran
- Abstract summary: Diffusion models are a powerful solution for high-fidelity image generation, which exceeds GANs in quality in many circumstances.
Recent DiffusionGAN method significantly decreases the models' running time by reducing the number of sampling steps from thousands to several, but their speeds still largely lag behind the GAN counterparts.
This paper aims to reduce the speed gap by proposing a novel wavelet-based diffusion scheme.
We extract low-and-high frequency components from both image and feature levels via wavelet decomposition and adaptively handle these components for faster processing while maintaining good generation quality.
- Score: 3.222802562733787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models are rising as a powerful solution for high-fidelity image
generation, which exceeds GANs in quality in many circumstances. However, their
slow training and inference speed is a huge bottleneck, blocking them from
being used in real-time applications. A recent DiffusionGAN method
significantly decreases the models' running time by reducing the number of
sampling steps from thousands to several, but their speeds still largely lag
behind the GAN counterparts. This paper aims to reduce the speed gap by
proposing a novel wavelet-based diffusion scheme. We extract low-and-high
frequency components from both image and feature levels via wavelet
decomposition and adaptively handle these components for faster processing
while maintaining good generation quality. Furthermore, we propose to use a
reconstruction term, which effectively boosts the model training convergence.
Experimental results on CelebA-HQ, CIFAR-10, LSUN-Church, and STL-10 datasets
prove our solution is a stepping-stone to offering real-time and high-fidelity
diffusion models. Our code and pre-trained checkpoints are available at
\url{https://github.com/VinAIResearch/WaveDiff.git}.
Related papers
- A Wavelet Diffusion GAN for Image Super-Resolution [7.986370916847687]
Diffusion models have emerged as a superior alternative to generative adversarial networks (GANs) for high-fidelity image generation.
However, their real-time feasibility is hindered by slow training and inference speeds.
This study proposes a wavelet-based conditional Diffusion GAN scheme for Single-Image Super-Resolution.
arXiv Detail & Related papers (2024-10-23T15:34:06Z) - FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner [70.90505084288057]
Flow-based models tend to produce a straighter sampling trajectory during the sampling process.
We introduce several techniques including a pseudo corrector and sample-aware compilation to further reduce inference time.
FlowTurbo reaches an FID of 2.12 on ImageNet with 100 (ms / img) and FID of 3.93 with 38 (ms / img)
arXiv Detail & Related papers (2024-09-26T17:59:51Z) - Latent Denoising Diffusion GAN: Faster sampling, Higher image quality [0.0]
Latent Denoising Diffusion GAN employs pre-trained autoencoders to compress images into a compact latent space.
Compared to its predecessors, DiffusionGAN and Wavelet Diffusion, our model shows remarkable improvements in all evaluation metrics.
arXiv Detail & Related papers (2024-06-17T16:32:23Z) - Efficient Diffusion Model for Image Restoration by Residual Shifting [63.02725947015132]
This study proposes a novel and efficient diffusion model for image restoration.
Our method avoids the need for post-acceleration during inference, thereby avoiding the associated performance deterioration.
Our method achieves superior or comparable performance to current state-of-the-art methods on three classical IR tasks.
arXiv Detail & Related papers (2024-03-12T05:06:07Z) - RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction [12.64898580131053]
We introduce RFWave, a cutting-edge multi-band Rectified Flow approach to reconstruct high-fidelity audio waveforms from Mel-spectrograms or discrete acoustic tokens.
RFWave uniquely generates complex spectrograms and operates at the frame level, processing all subbands simultaneously to boost efficiency.
Our empirical evaluations show that RFWave not only provides outstanding reconstruction quality but also offers vastly superior computational efficiency, enabling audio generation at speeds up to 160 times faster than real-time on a GPU.
arXiv Detail & Related papers (2024-03-08T03:16:47Z) - DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture.
DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep
Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.
We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost.
Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z) - ResShift: Efficient Diffusion Model for Image Super-resolution by
Residual Shifting [70.83632337581034]
Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed.
We propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps.
Our method constructs a Markov chain that transfers between the high-resolution image and the low-resolution image by shifting the residual.
arXiv Detail & Related papers (2023-07-23T15:10:02Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - WaveDM: Wavelet-Based Diffusion Models for Image Restoration [43.254438752311714]
Wavelet-Based Diffusion Model (WaveDM) learns the distribution of clean images in the wavelet domain conditioned on the wavelet spectrum of degraded images after wavelet transform.
WaveDM achieves state-of-the-art performance with the efficiency that is comparable to traditional one-pass methods.
arXiv Detail & Related papers (2023-05-23T08:41:04Z) - Accelerating Score-based Generative Models for High-Resolution Image
Synthesis [42.076244561541706]
Score-based generative models (SGMs) have recently emerged as a promising class of generative models.
In this work, we consider the acceleration of high-resolution generation with SGMs.
We introduce a novel Target Distribution Sampling Aware (TDAS) method by leveraging the structural priors in space and frequency domains.
arXiv Detail & Related papers (2022-06-08T17:41:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.