DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from
Low-Dimensional Latents
- URL: http://arxiv.org/abs/2201.00308v1
- Date: Sun, 2 Jan 2022 06:44:23 GMT
- Title: DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from
Low-Dimensional Latents
- Authors: Kushagra Pandey, Avideep Mukherjee, Piyush Rai, Abhishek Kumar
- Abstract summary: We present DiffuseVAE, a novel generative framework that integrates VAE within a diffusion model framework.
We show that the proposed model can generate high-resolution samples and exhibits quality comparable to state-of-the-art models on standard benchmarks.
- Score: 26.17940552906923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion Probabilistic models have been shown to generate state-of-the-art
results on several competitive image synthesis benchmarks but lack a
low-dimensional, interpretable latent space, and are slow at generation. On the
other hand, Variational Autoencoders (VAEs) typically have access to a
low-dimensional latent space but exhibit poor sample quality. Despite recent
advances, VAEs usually require high-dimensional hierarchies of the latent codes
to generate high-quality samples. We present DiffuseVAE, a novel generative
framework that integrates VAE within a diffusion model framework, and leverage
this to design a novel conditional parameterization for diffusion models. We
show that the resulting model can improve upon the unconditional diffusion
model in terms of sampling efficiency while also equipping diffusion models
with the low-dimensional VAE inferred latent code. Furthermore, we show that
the proposed model can generate high-resolution samples and exhibits synthesis
quality comparable to state-of-the-art models on standard benchmarks. Lastly,
we show that the proposed method can be used for controllable image synthesis
and also exhibits out-of-the-box capabilities for downstream tasks like image
super-resolution and denoising. For reproducibility, our source code is
publicly available at \url{https://github.com/kpandey008/DiffuseVAE}.
Related papers
- Advancing Diffusion Models: Alias-Free Resampling and Enhanced Rotational Equivariance [0.0]
diffusion models are still challenged by model-induced artifacts and limited stability in image fidelity.
We propose the integration of alias-free resampling layers into the UNet architecture of diffusion models.
Our experimental results on benchmark datasets, including CIFAR-10, MNIST, and MNIST-M, reveal consistent gains in image quality.
arXiv Detail & Related papers (2024-11-14T04:23:28Z) - Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.
Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL.
We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution.
Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Fast Inference in Denoising Diffusion Models via MMD Finetuning [23.779985842891705]
We present MMD-DDM, a novel method for fast sampling of diffusion models.
Our approach is based on the idea of using the Maximum Mean Discrepancy (MMD) to finetune the learned distribution with a given budget of timesteps.
Our findings show that the proposed method is able to produce high-quality samples in a fraction of the time required by widely-used diffusion models.
arXiv Detail & Related papers (2023-01-19T09:48:07Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.
Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z) - Global Context with Discrete Diffusion in Vector Quantised Modelling for
Image Generation [19.156223720614186]
The integration of Vector Quantised Variational AutoEncoder with autoregressive models as generation part has yielded high-quality results on image generation.
We show that with the help of a content-rich discrete visual codebook from VQ-VAE, the discrete diffusion model can also generate high fidelity images with global context.
arXiv Detail & Related papers (2021-12-03T09:09:34Z) - Variational Diffusion Models [33.0719137062396]
We introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on image density estimation benchmarks.
We show that the variational lower bound (VLB) simplifies to a remarkably short expression in terms of the signal-to-noise ratio of the diffused data.
arXiv Detail & Related papers (2021-07-01T17:43:20Z) - Denoising Diffusion Probabilistic Models [91.94962645056896]
We present high quality image synthesis results using diffusion probabilistic models.
Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics.
arXiv Detail & Related papers (2020-06-19T17:24:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.