Related papers: Improved Vector Quantized Diffusion Models

Improved Vector Quantized Diffusion Models

URL: http://arxiv.org/abs/2205.16007v1
Date: Tue, 31 May 2022 17:59:53 GMT
Title: Improved Vector Quantized Diffusion Models
Authors: Zhicong Tang, Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen
Abstract summary: VQ-Diffusion is a powerful generative model for text-to-image synthesis. It can generate low-quality samples or weakly correlated images with text input. We propose two techniques to further improve the sample quality of VQ-Diffusion.
Score: 34.23016989464389
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vector quantized diffusion (VQ-Diffusion) is a powerful generative model for text-to-image synthesis, but sometimes can still generate low-quality samples or weakly correlated images with text input. We find these issues are mainly due to the flawed sampling strategy. In this paper, we propose two important techniques to further improve the sample quality of VQ-Diffusion. 1) We explore classifier-free guidance sampling for discrete denoising diffusion model and propose a more general and effective implementation of classifier-free guidance. 2) We present a high-quality inference strategy to alleviate the joint distribution issue in VQ-Diffusion. Finally, we conduct experiments on various datasets to validate their effectiveness and show that the improved VQ-Diffusion suppresses the vanilla version by large margins. We achieve an 8.44 FID score on MSCOCO, surpassing VQ-Diffusion by 5.42 FID score. When trained on ImageNet, we dramatically improve the FID score from 11.89 to 4.83, demonstrating the superiority of our proposed techniques.

Related papers

Latent Guidance in Diffusion Models for Perceptual Evaluations [33.915594693285556]
latent diffusion models implicitly exhibit perceptually consistent local regions within the data manifold.<n>We propose Perceptual Manifold Guidance (PMG), an algorithm that utilizes pretrained latent diffusion models and perceptual quality features.<n>Our method achieves state-of-the-art performance, underscoring the superior generalization capabilities of diffusion models for NR-IQA tasks.
arXiv Detail & Related papers (2025-05-31T00:41:59Z)
Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering [18.543769006014383]
Diffusion models often exhibit inconsistent sample quality due to variations inherent in their sampling trajectories.<n>We introduce CFG-Rejection, an efficient, plug-and-play strategy that filters low-quality samples at an early stage of the denoising process.<n>We validate the effectiveness of CFG-Rejection in image generation through extensive experiments.
arXiv Detail & Related papers (2025-05-29T11:08:24Z)
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation [55.424665700339695]
Diffusion-based audio-driven talking avatar methods have recently gained attention for their high-fidelity, vivid, and expressive results. Despite the development of various distillation techniques for diffusion models, we found that naive diffusion distillation methods do not yield satisfactory results. We propose FADA (Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation) to address this problem.
arXiv Detail & Related papers (2024-12-22T08:19:22Z)
IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis [22.79121512759783]
IV-Mixed Sampler is a novel training-free algorithm for video diffusion models. It uses IDMs to enhance the quality of each video frame and VDMs to ensure the temporal coherence of the video during the sampling process. It achieves state-of-the-art performance on four benchmarks including UCF-101-FVD, MSR-VTT-FVD, Chronomagic-Bench-150, and Chronomagic-Bench-1649.
arXiv Detail & Related papers (2024-10-05T14:33:28Z)
Learning Quantized Adaptive Conditions for Diffusion Models [19.9601581920218]
We propose a novel and effective approach to reduce trajectory curvature by utilizing adaptive conditions. Our method incurs only an additional 1% of training parameters, eliminates the need for extra regularization terms, yet significantly better sample quality.
arXiv Detail & Related papers (2024-09-26T02:49:51Z)
Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing [49.800746112114375]
We propose a novel post-training quantization method (Progressive and Relaxing) for text-to-image diffusion models. We are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
arXiv Detail & Related papers (2023-11-10T09:10:09Z)
Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration. We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z)
Are Diffusion Models Vision-And-Language Reasoners? [30.579483430697803]
We transform diffusion-based models for any image-text matching (ITM) task using a novel method called DiffusionITM. We introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis. We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like CLEVR and Winoground.
arXiv Detail & Related papers (2023-05-25T18:02:22Z)
Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks. We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture. We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z)
On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from. For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model. For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z)
How Much is Enough? A Study on Diffusion Times in Score-based Generative Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution. We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality [44.37533757879762]
We introduce Differentiable Diffusion Sampler Search (DDSS), a method that optimize fast samplers for any pre-trained diffusion model. We also present Generalized Gaussian Diffusion Models (GGDM), a family of flexible non-Markovian samplers for diffusion models. Our method is compatible with any pre-trained diffusion model without fine-tuning or re-training required.
arXiv Detail & Related papers (2022-02-11T18:53:18Z)
Cascaded Diffusion Models for High Fidelity Image Generation [53.57766722279425]
We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge. A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution. We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation.
arXiv Detail & Related papers (2021-05-30T17:14:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.