Improved Vector Quantized Diffusion Models
- URL: http://arxiv.org/abs/2205.16007v1
- Date: Tue, 31 May 2022 17:59:53 GMT
- Title: Improved Vector Quantized Diffusion Models
- Authors: Zhicong Tang, Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen
- Abstract summary: VQ-Diffusion is a powerful generative model for text-to-image synthesis.
It can generate low-quality samples or weakly correlated images with text input.
We propose two techniques to further improve the sample quality of VQ-Diffusion.
- Score: 34.23016989464389
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vector quantized diffusion (VQ-Diffusion) is a powerful generative model for
text-to-image synthesis, but sometimes can still generate low-quality samples
or weakly correlated images with text input. We find these issues are mainly
due to the flawed sampling strategy. In this paper, we propose two important
techniques to further improve the sample quality of VQ-Diffusion. 1) We explore
classifier-free guidance sampling for discrete denoising diffusion model and
propose a more general and effective implementation of classifier-free
guidance. 2) We present a high-quality inference strategy to alleviate the
joint distribution issue in VQ-Diffusion. Finally, we conduct experiments on
various datasets to validate their effectiveness and show that the improved
VQ-Diffusion suppresses the vanilla version by large margins. We achieve an
8.44 FID score on MSCOCO, surpassing VQ-Diffusion by 5.42 FID score. When
trained on ImageNet, we dramatically improve the FID score from 11.89 to 4.83,
demonstrating the superiority of our proposed techniques.
Related papers
- FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation [55.424665700339695]
Diffusion-based audio-driven talking avatar methods have recently gained attention for their high-fidelity, vivid, and expressive results.
Despite the development of various distillation techniques for diffusion models, we found that naive diffusion distillation methods do not yield satisfactory results.
We propose FADA (Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation) to address this problem.
arXiv Detail & Related papers (2024-12-22T08:19:22Z) - IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis [22.79121512759783]
IV-Mixed Sampler is a novel training-free algorithm for video diffusion models.
It uses IDMs to enhance the quality of each video frame and VDMs to ensure the temporal coherence of the video during the sampling process.
It achieves state-of-the-art performance on four benchmarks including UCF-101-FVD, MSR-VTT-FVD, Chronomagic-Bench-150, and Chronomagic-Bench-1649.
arXiv Detail & Related papers (2024-10-05T14:33:28Z) - Learning Quantized Adaptive Conditions for Diffusion Models [19.9601581920218]
We propose a novel and effective approach to reduce trajectory curvature by utilizing adaptive conditions.
Our method incurs only an additional 1% of training parameters, eliminates the need for extra regularization terms, yet significantly better sample quality.
arXiv Detail & Related papers (2024-09-26T02:49:51Z) - Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing [49.800746112114375]
We propose a novel post-training quantization method (Progressive and Relaxing) for text-to-image diffusion models.
We are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
arXiv Detail & Related papers (2023-11-10T09:10:09Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Are Diffusion Models Vision-And-Language Reasoners? [30.579483430697803]
We transform diffusion-based models for any image-text matching (ITM) task using a novel method called DiffusionITM.
We introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis.
We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like CLEVR and Winoground.
arXiv Detail & Related papers (2023-05-25T18:02:22Z) - Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks.
We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture.
We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - Learning Fast Samplers for Diffusion Models by Differentiating Through
Sample Quality [44.37533757879762]
We introduce Differentiable Diffusion Sampler Search (DDSS), a method that optimize fast samplers for any pre-trained diffusion model.
We also present Generalized Gaussian Diffusion Models (GGDM), a family of flexible non-Markovian samplers for diffusion models.
Our method is compatible with any pre-trained diffusion model without fine-tuning or re-training required.
arXiv Detail & Related papers (2022-02-11T18:53:18Z) - Cascaded Diffusion Models for High Fidelity Image Generation [53.57766722279425]
We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge.
A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution.
We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation.
arXiv Detail & Related papers (2021-05-30T17:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.