Related papers: DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents

DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents

URL: http://arxiv.org/abs/2201.00308v1
Date: Sun, 2 Jan 2022 06:44:23 GMT
Title: DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents
Authors: Kushagra Pandey, Avideep Mukherjee, Piyush Rai, Abhishek Kumar
Abstract summary: We present DiffuseVAE, a novel generative framework that integrates VAE within a diffusion model framework. We show that the proposed model can generate high-resolution samples and exhibits quality comparable to state-of-the-art models on standard benchmarks.
Score: 26.17940552906923
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand, Variational Autoencoders (VAEs) typically have access to a low-dimensional latent space but exhibit poor sample quality. Despite recent advances, VAEs usually require high-dimensional hierarchies of the latent codes to generate high-quality samples. We present DiffuseVAE, a novel generative framework that integrates VAE within a diffusion model framework, and leverage this to design a novel conditional parameterization for diffusion models. We show that the resulting model can improve upon the unconditional diffusion model in terms of sampling efficiency while also equipping diffusion models with the low-dimensional VAE inferred latent code. Furthermore, we show that the proposed model can generate high-resolution samples and exhibits synthesis quality comparable to state-of-the-art models on standard benchmarks. Lastly, we show that the proposed method can be used for controllable image synthesis and also exhibits out-of-the-box capabilities for downstream tasks like image super-resolution and denoising. For reproducibility, our source code is publicly available at \url{https://github.com/kpandey008/DiffuseVAE}.

Related papers

Exploring Representation-Aligned Latent Space for Better Generation [86.45670422239317]
We introduce ReaLS, which integrates semantic priors to improve generation performance. We show that fundamental DiT and SiT trained on ReaLS can achieve a 15% improvement in FID metric. The enhanced semantic latent space enables more perceptual downstream tasks, such as segmentation and depth estimation.
arXiv Detail & Related papers (2025-02-01T07:42:12Z)
Advancing Diffusion Models: Alias-Free Resampling and Enhanced Rotational Equivariance [0.0]
diffusion models are still challenged by model-induced artifacts and limited stability in image fidelity. We propose the integration of alias-free resampling layers into the UNet architecture of diffusion models. Our experimental results on benchmark datasets, including CIFAR-10, MNIST, and MNIST-M, reveal consistent gains in image quality.
arXiv Detail & Related papers (2024-11-14T04:23:28Z)
Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step. Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z)
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL. We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution. Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z)
Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation. We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z)
Fast Inference in Denoising Diffusion Models via MMD Finetuning [23.779985842891705]
We present MMD-DDM, a novel method for fast sampling of diffusion models. Our approach is based on the idea of using the Maximum Mean Discrepancy (MMD) to finetune the learned distribution with a given budget of timesteps. Our findings show that the proposed method is able to produce high-quality samples in a fraction of the time required by widely-used diffusion models.
arXiv Detail & Related papers (2023-01-19T09:48:07Z)
Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis. Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z)
Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation [19.156223720614186]
The integration of Vector Quantised Variational AutoEncoder with autoregressive models as generation part has yielded high-quality results on image generation. We show that with the help of a content-rich discrete visual codebook from VQ-VAE, the discrete diffusion model can also generate high fidelity images with global context.
arXiv Detail & Related papers (2021-12-03T09:09:34Z)
Variational Diffusion Models [33.0719137062396]
We introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on image density estimation benchmarks. We show that the variational lower bound (VLB) simplifies to a remarkably short expression in terms of the signal-to-noise ratio of the diffused data.
arXiv Detail & Related papers (2021-07-01T17:43:20Z)
Denoising Diffusion Probabilistic Models [91.94962645056896]
We present high quality image synthesis results using diffusion probabilistic models. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics.
arXiv Detail & Related papers (2020-06-19T17:24:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.