NP-DRAW: A Non-Parametric Structured Latent Variable Modelfor Image
Generation
- URL: http://arxiv.org/abs/2106.13435v1
- Date: Fri, 25 Jun 2021 05:17:55 GMT
- Title: NP-DRAW: A Non-Parametric Structured Latent Variable Modelfor Image
Generation
- Authors: Xiaohui Zeng, Raquel Urtasun, Richard Zemel, Sanja Fidler, Renjie Liao
- Abstract summary: We present a non-parametric structured latent variable model for image generation, called NP-DRAW.
It sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas.
- Score: 139.8037697822064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a non-parametric structured latent variable model
for image generation, called NP-DRAW, which sequentially draws on a latent
canvas in a part-by-part fashion and then decodes the image from the canvas.
Our key contributions are as follows. 1) We propose a non-parametric prior
distribution over the appearance of image parts so that the latent variable
``what-to-draw'' per step becomes a categorical random variable. This improves
the expressiveness and greatly eases the learning compared to Gaussians used in
the literature. 2) We model the sequential dependency structure of parts via a
Transformer, which is more powerful and easier to train compared to RNNs used
in the literature. 3) We propose an effective heuristic parsing algorithm to
pre-train the prior. Experiments on MNIST, Omniglot, CIFAR-10, and CelebA show
that our method significantly outperforms previous structured image models like
DRAW and AIR and is competitive to other generic generative models. Moreover,
we show that our model's inherent compositionality and interpretability bring
significant benefits in the low-data learning regime and latent space editing.
Code is available at \url{https://github.com/ZENGXH/NPDRAW}.
Related papers
- Invariant Shape Representation Learning For Image Classification [41.610264291150706]
In this paper, we introduce a novel framework that for the first time develops invariant shape representation learning (ISRL)
Our model ISRL is designed to jointly capture invariant features in latent shape spaces parameterized by deformable transformations.
By embedding the features that are invariant with regard to target variables in different environments, our model consistently offers more accurate predictions.
arXiv Detail & Related papers (2024-11-19T03:39:43Z) - Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective [52.778766190479374]
Latent-based image generative models have achieved notable success in image generation tasks.
Despite sharing the same latent space, autoregressive models significantly lag behind LDMs and MIMs in image generation.
We propose a simple but effective discrete image tokenizer to stabilize the latent space for image generative modeling.
arXiv Detail & Related papers (2024-10-16T12:13:17Z) - WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting [2.3014300466616078]
This paper diverges from vision transformers by using a computationally-efficient WaveMix-based fully convolutional architecture -- WavePaint.
It uses a 2D-discrete wavelet transform (DWT) for spatial and multi-resolution token-mixing along with convolutional layers.
Our model even outperforms current GAN-based architectures in CelebA-HQ dataset without using an adversarially trainable discriminator.
arXiv Detail & Related papers (2023-07-01T18:41:34Z) - Transformer-based Image Generation from Scene Graphs [11.443097632746763]
Graph-structured scene descriptions can be efficiently used in generative models to control the composition of the generated image.
Previous approaches are based on the combination of graph convolutional networks and adversarial methods for layout prediction and image generation.
We show how employing multi-head attention to encode the graph information can improve the quality of the sampled data.
arXiv Detail & Related papers (2023-03-08T14:54:51Z) - LayoutDiffuse: Adapting Foundational Diffusion Models for
Layout-to-Image Generation [24.694298869398033]
Our method trains efficiently, generates images with both high perceptual quality and layout alignment.
Our method significantly outperforms other 10 generative models based on GANs, VQ-VAE, and diffusion models.
arXiv Detail & Related papers (2023-02-16T14:20:25Z) - Uncovering the Disentanglement Capability in Text-to-Image Diffusion
Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes.
We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z) - FewGAN: Generating from the Joint Distribution of a Few Images [95.6635227371479]
We introduce FewGAN, a generative model for generating novel, high-quality and diverse images.
FewGAN is a hierarchical patch-GAN that applies quantization at the first coarse scale, followed by a pyramid of residual fully convolutional GANs at finer scales.
In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-07-18T07:11:28Z) - InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model.
This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.