Variational Diffusion Auto-encoder: Latent Space Extraction from
Pre-trained Diffusion Models
- URL: http://arxiv.org/abs/2304.12141v2
- Date: Thu, 18 May 2023 22:44:12 GMT
- Title: Variational Diffusion Auto-encoder: Latent Space Extraction from
Pre-trained Diffusion Models
- Authors: Georgios Batzolis, Jan Stanczuk, Carola-Bibiane Sch\"onlieb
- Abstract summary: Variational Auto-Encoders (VAEs) face challenges with the quality of generated images, often presenting noticeable blurriness.
This issue stems from the unrealistic assumption that approximates the conditional data distribution, $p(textbfx | textbfz)$, as an isotropic Gaussian.
We illustrate how one can extract a latent space from a pre-existing diffusion model by optimizing an encoder to maximize the marginal data log-likelihood.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a widely recognized approach to deep generative modeling, Variational
Auto-Encoders (VAEs) still face challenges with the quality of generated
images, often presenting noticeable blurriness. This issue stems from the
unrealistic assumption that approximates the conditional data distribution,
$p(\textbf{x} | \textbf{z})$, as an isotropic Gaussian. In this paper, we
propose a novel solution to address these issues. We illustrate how one can
extract a latent space from a pre-existing diffusion model by optimizing an
encoder to maximize the marginal data log-likelihood. Furthermore, we
demonstrate that a decoder can be analytically derived post encoder-training,
employing the Bayes rule for scores. This leads to a VAE-esque deep latent
variable model, which discards the need for Gaussian assumptions on
$p(\textbf{x} | \textbf{z})$ or the training of a separate decoder network. Our
method, which capitalizes on the strengths of pre-trained diffusion models and
equips them with latent spaces, results in a significant enhancement to the
performance of VAEs.
Related papers
- Continuous Speculative Decoding for Autoregressive Image Generation [33.05392461723613]
Continuous-valued Autoregressive (AR) image generation models have demonstrated notable superiority over their discrete-token counterparts.
speculative decoding has proven effective in accelerating Large Language Models (LLMs)
This work generalizes the speculative decoding algorithm from discrete tokens to continuous space.
arXiv Detail & Related papers (2024-11-18T09:19:15Z) - Amortizing intractable inference in diffusion models for vision, language, and control [89.65631572949702]
This paper studies amortized sampling of the posterior over data, $mathbfxsim prm post(mathbfx)propto p(mathbfx)r(mathbfx)$, in a model that consists of a diffusion generative model prior $p(mathbfx)$ and a black-box constraint or function $r(mathbfx)$.
We prove the correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from
arXiv Detail & Related papers (2024-05-31T16:18:46Z) - Upgrading VAE Training With Unlimited Data Plans Provided by Diffusion
Models [12.542073306638988]
We show that overfitting encoders in VAEs can be effectively mitigated by training on samples from a pre-trained diffusion model.
We analyze generalization performance, amortization gap, and robustness of VAEs trained with our proposed method on three different data sets.
arXiv Detail & Related papers (2023-10-30T15:38:39Z) - Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z) - Dior-CVAE: Pre-trained Language Models and Diffusion Priors for
Variational Dialog Generation [70.2283756542824]
Dior-CVAE is a hierarchical conditional variational autoencoder (CVAE) with diffusion priors to address these challenges.
We employ a diffusion model to increase the complexity of the prior distribution and its compatibility with the distributions produced by a PLM.
Experiments across two commonly used open-domain dialog datasets show that our method can generate more diverse responses without large-scale dialog pre-training.
arXiv Detail & Related papers (2023-05-24T11:06:52Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z) - Fully Bayesian Autoencoders with Latent Sparse Gaussian Processes [23.682509357305406]
Autoencoders and their variants are among the most widely used models in representation learning and generative modeling.
We propose a novel Sparse Gaussian Process Bayesian Autoencoder model in which we impose fully sparse Gaussian Process priors on the latent space of a Bayesian Autoencoder.
arXiv Detail & Related papers (2023-02-09T09:57:51Z) - Lossy Image Compression with Conditional Diffusion Models [25.158390422252097]
This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models.
In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model.
Our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics.
arXiv Detail & Related papers (2022-09-14T21:53:27Z) - Diffusion models as plug-and-play priors [98.16404662526101]
We consider the problem of inferring high-dimensional data $mathbfx$ in a model that consists of a prior $p(mathbfx)$ and an auxiliary constraint $c(mathbfx,mathbfy)$.
The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise.
arXiv Detail & Related papers (2022-06-17T21:11:36Z) - Exponentially Tilted Gaussian Prior for Variational Autoencoder [3.52359746858894]
Recent studies show that probabilistic generative models can perform poorly on this task.
We propose the exponentially tilted Gaussian prior distribution for the Variational Autoencoder (VAE)
We show that our model produces high quality image samples which are more crisp than that of a standard Gaussian VAE.
arXiv Detail & Related papers (2021-11-30T18:28:19Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.