Global Context with Discrete Diffusion in Vector Quantised Modelling for
Image Generation
- URL: http://arxiv.org/abs/2112.01799v1
- Date: Fri, 3 Dec 2021 09:09:34 GMT
- Title: Global Context with Discrete Diffusion in Vector Quantised Modelling for
Image Generation
- Authors: Minghui Hu, Yujie Wang, Tat-Jen Cham, Jianfei Yang, P.N.Suganthan
- Abstract summary: The integration of Vector Quantised Variational AutoEncoder with autoregressive models as generation part has yielded high-quality results on image generation.
We show that with the help of a content-rich discrete visual codebook from VQ-VAE, the discrete diffusion model can also generate high fidelity images with global context.
- Score: 19.156223720614186
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of Vector Quantised Variational AutoEncoder (VQ-VAE) with
autoregressive models as generation part has yielded high-quality results on
image generation. However, the autoregressive models will strictly follow the
progressive scanning order during the sampling phase. This leads the existing
VQ series models to hardly escape the trap of lacking global information.
Denoising Diffusion Probabilistic Models (DDPM) in the continuous domain have
shown a capability to capture the global context, while generating high-quality
images. In the discrete state space, some works have demonstrated the potential
to perform text generation and low resolution image generation. We show that
with the help of a content-rich discrete visual codebook from VQ-VAE, the
discrete diffusion model can also generate high fidelity images with global
context, which compensates for the deficiency of the classical autoregressive
model along pixel space. Meanwhile, the integration of the discrete VAE with
the diffusion model resolves the drawback of conventional autoregressive models
being oversized, and the diffusion model which demands excessive time in the
sampling process when generating images. It is found that the quality of the
generated images is heavily dependent on the discrete visual codebook.
Extensive experiments demonstrate that the proposed Vector Quantised Discrete
Diffusion Model (VQ-DDM) is able to achieve comparable performance to top-tier
methods with low complexity. It also demonstrates outstanding advantages over
other vectors quantised with autoregressive models in terms of image inpainting
tasks without additional training.
Related papers
- MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL.
We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution.
Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Deep Equilibrium Approaches to Diffusion Models [1.4275201654498746]
Diffusion-based generative models are extremely effective in generating high-quality images.
These models typically require long sampling chains to produce high-fidelity images.
We look at diffusion models through a different perspective, that of a (deep) equilibrium (DEQ) fixed point model.
arXiv Detail & Related papers (2022-10-23T22:02:19Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z) - DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.
Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z) - DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from
Low-Dimensional Latents [26.17940552906923]
We present DiffuseVAE, a novel generative framework that integrates VAE within a diffusion model framework.
We show that the proposed model can generate high-resolution samples and exhibits quality comparable to state-of-the-art models on standard benchmarks.
arXiv Detail & Related papers (2022-01-02T06:44:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.