Using latent space regression to analyze and leverage compositionality
in GANs
- URL: http://arxiv.org/abs/2103.10426v1
- Date: Thu, 18 Mar 2021 17:58:01 GMT
- Title: Using latent space regression to analyze and leverage compositionality
in GANs
- Authors: Lucy Chai, Jonas Wulff, Phillip Isola
- Abstract summary: We investigate regression into the latent space as a probe to understand the compositional properties of GANs.
We find that combining the regressor and a pretrained generator provides a strong image prior, allowing us to create composite images.
We find that the regression approach enables more localized editing of individual image parts compared to direct editing in the latent space.
- Score: 33.381584322411626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, Generative Adversarial Networks have become ubiquitous in
both research and public perception, but how GANs convert an unstructured
latent code to a high quality output is still an open question. In this work,
we investigate regression into the latent space as a probe to understand the
compositional properties of GANs. We find that combining the regressor and a
pretrained generator provides a strong image prior, allowing us to create
composite images from a collage of random image parts at inference time while
maintaining global consistency. To compare compositional properties across
different generators, we measure the trade-offs between reconstruction of the
unrealistic input and image quality of the regenerated samples. We find that
the regression approach enables more localized editing of individual image
parts compared to direct editing in the latent space, and we conduct
experiments to quantify this independence effect. Our method is agnostic to the
semantics of edits, and does not require labels or predefined concepts during
training. Beyond image composition, our method extends to a number of related
applications, such as image inpainting or example-based image editing, which we
demonstrate on several GANs and datasets, and because it uses only a single
forward pass, it can operate in real-time. Code is available on our project
page: https://chail.github.io/latent-composition/.
Related papers
- MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation [54.64194935409982]
We introduce MuLAn: a novel dataset comprising over 44K MUlti-Layer-wise RGBA decompositions.
MuLAn is the first photorealistic resource providing instance decomposition and spatial information for high quality images.
We aim to encourage the development of novel generation and editing technology, in particular layer-wise solutions.
arXiv Detail & Related papers (2024-04-03T14:58:00Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - Rethinking Polyp Segmentation from an Out-of-Distribution Perspective [37.1338930936671]
We leverage the ability of masked autoencoders -- self-supervised vision transformers trained on a reconstruction task -- to learn in-distribution representations.
We perform out-of-distribution reconstruction and inference, with feature space standardisation to align the latent distribution of the diverse abnormal samples with the statistics of the healthy samples.
Experimental results on six benchmarks show that our model has excellent segmentation performance and generalises across datasets.
arXiv Detail & Related papers (2023-06-13T14:13:16Z) - In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing [28.790900756506833]
3D-aware GANs offer new capabilities for view synthesis while preserving the editing functionalities of their 2D counterparts.
GAN inversion is a crucial step that seeks the latent code to reconstruct input images or videos, subsequently enabling diverse editing tasks through manipulation of this latent code.
We address this issue by explicitly modeling OOD objects from the input in 3D-aware GANs.
arXiv Detail & Related papers (2023-02-09T18:59:56Z) - Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm.
With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images.
In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z) - A Method for Evaluating Deep Generative Models of Images via Assessing
the Reproduction of High-order Spatial Context [9.00018232117916]
Generative adversarial networks (GANs) are one kind of DGM which are widely employed.
In this work, we demonstrate several objective tests of images output by two popular GAN architectures.
We designed several context models (SCMs) of distinct image features that can be recovered after generation by a trained GAN.
arXiv Detail & Related papers (2021-11-24T15:58:10Z) - Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z) - In-Domain GAN Inversion for Real Image Editing [56.924323432048304]
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.
Existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space.
We propose an in-domain GAN inversion approach, which faithfully reconstructs the input image and ensures the inverted code to be semantically meaningful for editing.
arXiv Detail & Related papers (2020-03-31T18:20:18Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.