NVAE: A Deep Hierarchical Variational Autoencoder
- URL: http://arxiv.org/abs/2007.03898v3
- Date: Fri, 8 Jan 2021 03:08:58 GMT
- Title: NVAE: A Deep Hierarchical Variational Autoencoder
- Authors: Arash Vahdat, Jan Kautz
- Abstract summary: We propose a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization.
We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models.
To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$times $256 pixels.
- Score: 102.29977384039805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Normalizing flows, autoregressive models, variational autoencoders (VAEs),
and deep energy-based models are among competing likelihood-based frameworks
for deep generative learning. Among them, VAEs have the advantage of fast and
tractable sampling and easy-to-access encoding networks. However, they are
currently outperformed by other models such as normalizing flows and
autoregressive models. While the majority of the research in VAEs is focused on
the statistical challenges, we explore the orthogonal direction of carefully
designing neural architectures for hierarchical VAEs. We propose Nouveau VAE
(NVAE), a deep hierarchical VAE built for image generation using depth-wise
separable convolutions and batch normalization. NVAE is equipped with a
residual parameterization of Normal distributions and its training is
stabilized by spectral regularization. We show that NVAE achieves
state-of-the-art results among non-autoregressive likelihood-based models on
the MNIST, CIFAR-10, CelebA 64, and CelebA HQ datasets and it provides a strong
baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art
from 2.98 to 2.91 bits per dimension, and it produces high-quality images on
CelebA HQ. To the best of our knowledge, NVAE is the first successful VAE
applied to natural images as large as 256$\times$256 pixels. The source code is
available at https://github.com/NVlabs/NVAE .
Related papers
- Hierarchical Patch Diffusion Models for High-Resolution Video Generation [50.42746357450949]
We develop deep context fusion, which propagates context information from low-scale to high-scale patches in a hierarchical manner.
We also propose adaptive computation, which allocates more network capacity and computation towards coarse image details.
The resulting model sets a new state-of-the-art FVD score of 66.32 and Inception Score of 87.68 in class-conditional video generation.
arXiv Detail & Related papers (2024-06-12T01:12:53Z) - Variational Bayes image restoration with compressive autoencoders [4.879530644978008]
Regularization of inverse problems is of paramount importance in computational imaging.
In this work, we first propose to use compressive autoencoders instead of state-of-the-art generative models.
As a second contribution, we introduce the Variational Bayes Latent Estimation (VBLE) algorithm.
arXiv Detail & Related papers (2023-11-29T15:49:31Z) - DeepDC: Deep Distance Correlation as a Perceptual Image Quality
Evaluator [53.57431705309919]
ImageNet pre-trained deep neural networks (DNNs) show notable transferability for building effective image quality assessment (IQA) models.
We develop a novel full-reference IQA (FR-IQA) model based exclusively on pre-trained DNN features.
We conduct comprehensive experiments to demonstrate the superiority of the proposed quality model on five standard IQA datasets.
arXiv Detail & Related papers (2022-11-09T14:57:27Z) - Denoising Masked AutoEncoders are Certifiable Robust Vision Learners [37.04863068273281]
We propose a new self-supervised method, which is called Denoising Masked AutoEncoders (DMAE)
DMAE corrupts each image by adding Gaussian noises to each pixel value and randomly masking several patches.
A Transformer-based encoder-decoder model is then trained to reconstruct the original image from the corrupted one.
arXiv Detail & Related papers (2022-10-10T12:37:59Z) - Lossy Image Compression with Quantized Hierarchical VAEs [33.173021636656465]
ResNet VAEs are originally designed for data (image) distribution modeling.
We present a powerful and efficient model that outperforms previous methods on natural image lossy compression.
Our model compresses images in a coarse-to-fine fashion and supports parallel encoding and decoding.
arXiv Detail & Related papers (2022-08-27T17:15:38Z) - Spatial Dependency Networks: Neural Layers for Improved Generative Image
Modeling [79.15521784128102]
We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs)
In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way.
We show that augmenting the decoder of a hierarchical VAE by spatial dependency layers considerably improves density estimation.
arXiv Detail & Related papers (2021-03-16T07:01:08Z) - Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them
on Images [9.667538864515285]
We present a hierarchical VAE that, for the first time, generates samples quickly while outperforming the PixelCNN in log-likelihood on all natural image benchmarks.
In theory, VAEs can represent autoregressive models, as well as faster, better models if they exist, when made sufficiently deep.
arXiv Detail & Related papers (2020-11-20T21:35:31Z) - A Contrastive Learning Approach for Training Variational Autoencoder
Priors [137.62674958536712]
Variational autoencoders (VAEs) are one of the powerful likelihood-based generative models with applications in many domains.
One explanation for VAEs' poor generative quality is the prior hole problem: the prior distribution fails to match the aggregate approximate posterior.
We propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior.
arXiv Detail & Related papers (2020-10-06T17:59:02Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z) - Neuromorphologicaly-preserving Volumetric data encoding using VQ-VAE [4.221619479687068]
We show a VQ-VAE inspired network can efficiently encode a full-resolution 3D brain volume, compressing the data to $0.825%$ of the original size while maintaining image fidelity.
We then demonstrate that VQ-VAE decoded images preserve the morphological characteristics of the original data through voxel-based morphology and segmentation experiments.
arXiv Detail & Related papers (2020-02-13T18:18:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.