VAEBM: A Symbiosis between Variational Autoencoders and Energy-based
Models
- URL: http://arxiv.org/abs/2010.00654v3
- Date: Thu, 4 Nov 2021 23:49:01 GMT
- Title: VAEBM: A Symbiosis between Variational Autoencoders and Energy-based
Models
- Authors: Zhisheng Xiao, Karsten Kreis, Jan Kautz, Arash Vahdat
- Abstract summary: Energy-based models (EBMs) have recently been successful in representing complex distributions of small images.
VAEBM captures the overall mode structure of the data distribution using a state-of-the-art VAE.
It relies on its EBM component to explicitly exclude non-data-like regions from the model and refine the image samples.
- Score: 84.14682116977433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Energy-based models (EBMs) have recently been successful in representing
complex distributions of small images. However, sampling from them requires
expensive Markov chain Monte Carlo (MCMC) iterations that mix slowly in high
dimensional pixel space. Unlike EBMs, variational autoencoders (VAEs) generate
samples quickly and are equipped with a latent space that enables fast
traversal of the data manifold. However, VAEs tend to assign high probability
density to regions in data space outside the actual data distribution and often
fail at generating sharp images. In this paper, we propose VAEBM, a symbiotic
composition of a VAE and an EBM that offers the best of both worlds. VAEBM
captures the overall mode structure of the data distribution using a
state-of-the-art VAE and it relies on its EBM component to explicitly exclude
non-data-like regions from the model and refine the image samples. Moreover,
the VAE component in VAEBM allows us to speed up MCMC updates by
reparameterizing them in the VAE's latent space. Our experimental results show
that VAEBM outperforms state-of-the-art VAEs and EBMs in generative quality on
several benchmark image datasets by a large margin. It can generate
high-quality images as large as 256$\times$256 pixels with short MCMC chains.
We also demonstrate that VAEBM provides complete mode coverage and performs
well in out-of-distribution detection. The source code is available at
https://github.com/NVlabs/VAEBM
Related papers
- Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL.
We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution.
Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z) - HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation [1.5574423250822542]
We propose a U-shape architecture model for medical image segmentation, named Hybird Transformer vision Mamba UNet (HTM-UNet)
We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB, ETIS-Larib PolypDB public datasets and ZD-LCI-GIM private dataset.
arXiv Detail & Related papers (2024-08-21T02:25:14Z) - Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis [56.849285913695184]
Diffusion Mamba (DiM) is a sequence model for efficient high-resolution image synthesis.
DiM architecture achieves inference-time efficiency for high-resolution images.
Experiments demonstrate the effectiveness and efficiency of our DiM.
arXiv Detail & Related papers (2024-05-23T06:53:18Z) - Improving Denoising Diffusion Probabilistic Models via Exploiting Shared
Representations [5.517338199249029]
SR-DDPM is a class of generative models that produce high-quality images by reversing a noisy diffusion process.
By exploiting the similarity between diverse data distributions, our method can scale to multiple tasks without compromising the image quality.
We evaluate our method on standard image datasets and show that it outperforms both unconditional and conditional DDPM in terms of FID and SSIM metrics.
arXiv Detail & Related papers (2023-11-27T22:30:26Z) - PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and
Localization [64.39761523935613]
We present a new framework for Patch Distribution Modeling, PaDiM, to concurrently detect and localize anomalies in images.
PaDiM makes use of a pretrained convolutional neural network (CNN) for patch embedding.
It also exploits correlations between the different semantic levels of CNN to better localize anomalies.
arXiv Detail & Related papers (2020-11-17T17:29:18Z) - NVAE: A Deep Hierarchical Variational Autoencoder [102.29977384039805]
We propose a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization.
We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models.
To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$times $256 pixels.
arXiv Detail & Related papers (2020-07-08T04:56:56Z) - AE-OT-GAN: Training GANs from data specific latent distribution [21.48007565143911]
generative adversarial networks (GANs) areprominent models to generate realistic and crisp images.
GANs often encounter the mode collapse problems and arehard to train, which comes from approximating the intrinsicdiscontinuous distribution transform map with continuousDNNs.
The recently proposed AE-OT model addresses thisproblem by explicitly computing the discontinuous distribu-tion transform map.
In this paper, wepropose the AE-OT-GAN model to utilize the advantages ofthe both models: generate high quality images and at the same time overcome the mode collapse/mixture problems.
arXiv Detail & Related papers (2020-01-11T01:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.