Related papers: NVAE: A Deep Hierarchical Variational Autoencoder

NVAE: A Deep Hierarchical Variational Autoencoder

URL: http://arxiv.org/abs/2007.03898v3
Date: Fri, 8 Jan 2021 03:08:58 GMT
Title: NVAE: A Deep Hierarchical Variational Autoencoder
Authors: Arash Vahdat, Jan Kautz
Abstract summary: We propose a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$times $256 pixels.
Score: 102.29977384039805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$\times$256 pixels. The source code is available at https://github.com/NVlabs/NVAE .

Related papers

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models [33.519892081718716]
We propose aligning the latent space with pre-trained vision foundation models when training the visual tokenizers. Our proposed VA-VAE significantly expands the reconstruction-generation frontier of latent diffusion models. We build an enhanced DiT baseline with improved training strategies and architecture designs, termed LightningDiT.
arXiv Detail & Related papers (2025-01-02T18:59:40Z)
Jet: A Modern Transformer-Based Normalizing Flow [62.2573739835562]
We revisit the design of the coupling-based normalizing flow models by carefully ablating prior design choices. We achieve state-of-the-art quantitative and qualitative performance with a much simpler architecture.
arXiv Detail & Related papers (2024-12-19T18:09:42Z)
Hierarchical Patch Diffusion Models for High-Resolution Video Generation [50.42746357450949]
We develop deep context fusion, which propagates context information from low-scale to high-scale patches in a hierarchical manner. We also propose adaptive computation, which allocates more network capacity and computation towards coarse image details. The resulting model sets a new state-of-the-art FVD score of 66.32 and Inception Score of 87.68 in class-conditional video generation.
arXiv Detail & Related papers (2024-06-12T01:12:53Z)
Variational Bayes image restoration with compressive autoencoders [4.879530644978008]
Regularization of inverse problems is of paramount importance in computational imaging. In this work, we first propose to use compressive autoencoders instead of state-of-the-art generative models. As a second contribution, we introduce the Variational Bayes Latent Estimation (VBLE) algorithm.
arXiv Detail & Related papers (2023-11-29T15:49:31Z)
DeepDC: Deep Distance Correlation as a Perceptual Image Quality Evaluator [53.57431705309919]
ImageNet pre-trained deep neural networks (DNNs) show notable transferability for building effective image quality assessment (IQA) models. We develop a novel full-reference IQA (FR-IQA) model based exclusively on pre-trained DNN features. We conduct comprehensive experiments to demonstrate the superiority of the proposed quality model on five standard IQA datasets.
arXiv Detail & Related papers (2022-11-09T14:57:27Z)
Denoising Masked AutoEncoders are Certifiable Robust Vision Learners [37.04863068273281]
We propose a new self-supervised method, which is called Denoising Masked AutoEncoders (DMAE) DMAE corrupts each image by adding Gaussian noises to each pixel value and randomly masking several patches. A Transformer-based encoder-decoder model is then trained to reconstruct the original image from the corrupted one.
arXiv Detail & Related papers (2022-10-10T12:37:59Z)
Lossy Image Compression with Quantized Hierarchical VAEs [33.173021636656465]
ResNet VAEs are originally designed for data (image) distribution modeling. We present a powerful and efficient model that outperforms previous methods on natural image lossy compression. Our model compresses images in a coarse-to-fine fashion and supports parallel encoding and decoding.
arXiv Detail & Related papers (2022-08-27T17:15:38Z)
Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling [79.15521784128102]
We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs) In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way. We show that augmenting the decoder of a hierarchical VAE by spatial dependency layers considerably improves density estimation.
arXiv Detail & Related papers (2021-03-16T07:01:08Z)
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images [9.667538864515285]
We present a hierarchical VAE that, for the first time, generates samples quickly while outperforming the PixelCNN in log-likelihood on all natural image benchmarks. In theory, VAEs can represent autoregressive models, as well as faster, better models if they exist, when made sufficiently deep.
arXiv Detail & Related papers (2020-11-20T21:35:31Z)
A Contrastive Learning Approach for Training Variational Autoencoder Priors [137.62674958536712]
Variational autoencoders (VAEs) are one of the powerful likelihood-based generative models with applications in many domains. One explanation for VAEs' poor generative quality is the prior hole problem: the prior distribution fails to match the aggregate approximate posterior. We propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior.
arXiv Detail & Related papers (2020-10-06T17:59:02Z)
Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image. We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
Neuromorphologicaly-preserving Volumetric data encoding using VQ-VAE [4.221619479687068]
We show a VQ-VAE inspired network can efficiently encode a full-resolution 3D brain volume, compressing the data to $0.825%$ of the original size while maintaining image fidelity. We then demonstrate that VQ-VAE decoded images preserve the morphological characteristics of the original data through voxel-based morphology and segmentation experiments.
arXiv Detail & Related papers (2020-02-13T18:18:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.