NVAE: A Deep Hierarchical Variational Autoencoder
- URL: http://arxiv.org/abs/2007.03898v3
- Date: Fri, 8 Jan 2021 03:08:58 GMT
- Title: NVAE: A Deep Hierarchical Variational Autoencoder
- Authors: Arash Vahdat, Jan Kautz
- Abstract summary: We propose a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization.
We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models.
To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$times $256 pixels.
- Score: 102.29977384039805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Normalizing flows, autoregressive models, variational autoencoders (VAEs),
and deep energy-based models are among competing likelihood-based frameworks
for deep generative learning. Among them, VAEs have the advantage of fast and
tractable sampling and easy-to-access encoding networks. However, they are
currently outperformed by other models such as normalizing flows and
autoregressive models. While the majority of the research in VAEs is focused on
the statistical challenges, we explore the orthogonal direction of carefully
designing neural architectures for hierarchical VAEs. We propose Nouveau VAE
(NVAE), a deep hierarchical VAE built for image generation using depth-wise
separable convolutions and batch normalization. NVAE is equipped with a
residual parameterization of Normal distributions and its training is
stabilized by spectral regularization. We show that NVAE achieves
state-of-the-art results among non-autoregressive likelihood-based models on
the MNIST, CIFAR-10, CelebA 64, and CelebA HQ datasets and it provides a strong
baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art
from 2.98 to 2.91 bits per dimension, and it produces high-quality images on
CelebA HQ. To the best of our knowledge, NVAE is the first successful VAE
applied to natural images as large as 256$\times$256 pixels. The source code is
available at https://github.com/NVlabs/NVAE .
Related papers
- Quantum Down Sampling Filter for Variational Auto-encoder [0.504868948270058]
Variational Autoencoders (VAEs) are essential tools in generative modeling and image reconstruction.
This study aims to improve the quality of reconstructed images by enhancing their resolution and preserving finer details.
We propose a hybrid model that combines quantum computing techniques in the VAE encoder with convolutional neural networks (CNNs) in the decoder.
arXiv Detail & Related papers (2025-01-09T11:08:55Z) - Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models [34.15905637499148]
We propose aligning the latent space with pre-trained vision foundation models when training the visual tokenizers.
Our proposed VA-VAE significantly expands the reconstruction-generation frontier of latent diffusion models.
We build an enhanced DiT baseline with improved training strategies and architecture designs, termed LightningDiT.
arXiv Detail & Related papers (2025-01-02T18:59:40Z) - Jet: A Modern Transformer-Based Normalizing Flow [62.2573739835562]
We revisit the design of the coupling-based normalizing flow models by carefully ablating prior design choices.
We achieve state-of-the-art quantitative and qualitative performance with a much simpler architecture.
arXiv Detail & Related papers (2024-12-19T18:09:42Z) - DeepDC: Deep Distance Correlation as a Perceptual Image Quality
Evaluator [53.57431705309919]
ImageNet pre-trained deep neural networks (DNNs) show notable transferability for building effective image quality assessment (IQA) models.
We develop a novel full-reference IQA (FR-IQA) model based exclusively on pre-trained DNN features.
We conduct comprehensive experiments to demonstrate the superiority of the proposed quality model on five standard IQA datasets.
arXiv Detail & Related papers (2022-11-09T14:57:27Z) - Lossy Image Compression with Quantized Hierarchical VAEs [33.173021636656465]
ResNet VAEs are originally designed for data (image) distribution modeling.
We present a powerful and efficient model that outperforms previous methods on natural image lossy compression.
Our model compresses images in a coarse-to-fine fashion and supports parallel encoding and decoding.
arXiv Detail & Related papers (2022-08-27T17:15:38Z) - Spatial Dependency Networks: Neural Layers for Improved Generative Image
Modeling [79.15521784128102]
We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs)
In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way.
We show that augmenting the decoder of a hierarchical VAE by spatial dependency layers considerably improves density estimation.
arXiv Detail & Related papers (2021-03-16T07:01:08Z) - Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them
on Images [9.667538864515285]
We present a hierarchical VAE that, for the first time, generates samples quickly while outperforming the PixelCNN in log-likelihood on all natural image benchmarks.
In theory, VAEs can represent autoregressive models, as well as faster, better models if they exist, when made sufficiently deep.
arXiv Detail & Related papers (2020-11-20T21:35:31Z) - A Contrastive Learning Approach for Training Variational Autoencoder
Priors [137.62674958536712]
Variational autoencoders (VAEs) are one of the powerful likelihood-based generative models with applications in many domains.
One explanation for VAEs' poor generative quality is the prior hole problem: the prior distribution fails to match the aggregate approximate posterior.
We propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior.
arXiv Detail & Related papers (2020-10-06T17:59:02Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.