Vector Quantized Diffusion Model for Text-to-Image Synthesis
- URL: http://arxiv.org/abs/2111.14822v1
- Date: Mon, 29 Nov 2021 18:59:46 GMT
- Title: Vector Quantized Diffusion Model for Text-to-Image Synthesis
- Authors: Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen,
Lu Yuan, Baining Guo
- Abstract summary: We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation results.
- Score: 47.09451151258849
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the vector quantized diffusion (VQ-Diffusion) model for
text-to-image generation. This method is based on a vector quantized
variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional
variant of the recently developed Denoising Diffusion Probabilistic Model
(DDPM). We find that this latent-space method is well-suited for text-to-image
generation tasks because it not only eliminates the unidirectional bias with
existing methods but also allows us to incorporate a mask-and-replace diffusion
strategy to avoid the accumulation of errors, which is a serious problem with
existing methods. Our experiments show that the VQ-Diffusion produces
significantly better text-to-image generation results when compared with
conventional autoregressive (AR) models with similar numbers of parameters.
Compared with previous GAN-based text-to-image methods, our VQ-Diffusion can
handle more complex scenes and improve the synthesized image quality by a large
margin. Finally, we show that the image generation computation in our method
can be made highly efficient by reparameterization. With traditional AR
methods, the text-to-image generation time increases linearly with the output
image resolution and hence is quite time consuming even for normal size images.
The VQ-Diffusion allows us to achieve a better trade-off between quality and
speed. Our experiments indicate that the VQ-Diffusion model with the
reparameterization is fifteen times faster than traditional AR methods while
achieving a better image quality.
Related papers
- Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
Diffusion models have dominated the field of large, generative image models.
We propose an algorithm for fast-constrained sampling in large pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder [29.924160271522354]
Super-resolution (SR) and image generation are important tasks in computer vision and are widely adopted in real-world applications.
Most existing methods, however, generate images only at fixed-scale magnification and suffer from over-smoothing and artifacts.
Most relevant work applied Implicit Neural Representation (INR) to the denoising diffusion model to obtain continuous-resolution yet diverse and high-quality SR results.
We propose a novel pipeline that can super-resolve an input image or generate from a random noise a novel image at arbitrary scales.
arXiv Detail & Related papers (2024-03-15T12:45:40Z) - Iterative Token Evaluation and Refinement for Real-World
Super-Resolution [77.74289677520508]
Real-world image super-resolution (RWSR) is a long-standing problem as low-quality (LQ) images often have complex and unidentified degradations.
We propose an Iterative Token Evaluation and Refinement framework for RWSR.
We show that ITER is easier to train than Generative Adversarial Networks (GANs) and more efficient than continuous diffusion models.
arXiv Detail & Related papers (2023-12-09T17:07:32Z) - Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing [49.800746112114375]
We propose a novel post-training quantization method (Progressive and Relaxing) for text-to-image diffusion models.
We are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
arXiv Detail & Related papers (2023-11-10T09:10:09Z) - Prompt-tuning latent diffusion models for inverse problems [72.13952857287794]
We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors.
Our method, called P2L, outperforms both image- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting.
arXiv Detail & Related papers (2023-10-02T11:31:48Z) - Nested Diffusion Processes for Anytime Image Generation [38.84966342097197]
We propose an anytime diffusion-based method that can generate viable images when stopped at arbitrary times before completion.
In experiments on ImageNet and Stable Diffusion-based text-to-image generation, we show, both qualitatively and quantitatively, that our method's intermediate generation quality greatly exceeds that of the original diffusion model.
arXiv Detail & Related papers (2023-05-30T14:28:43Z) - Lossy Image Compression with Conditional Diffusion Models [25.158390422252097]
This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models.
In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model.
Our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics.
arXiv Detail & Related papers (2022-09-14T21:53:27Z) - Global Context with Discrete Diffusion in Vector Quantised Modelling for
Image Generation [19.156223720614186]
The integration of Vector Quantised Variational AutoEncoder with autoregressive models as generation part has yielded high-quality results on image generation.
We show that with the help of a content-rich discrete visual codebook from VQ-VAE, the discrete diffusion model can also generate high fidelity images with global context.
arXiv Detail & Related papers (2021-12-03T09:09:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.