Spatial Steerability of GANs via Self-Supervision from Discriminator
- URL: http://arxiv.org/abs/2301.08455v2
- Date: Tue, 9 Jan 2024 18:41:41 GMT
- Title: Spatial Steerability of GANs via Self-Supervision from Discriminator
- Authors: Jianyuan Wang, Lalit Bhagat, Ceyuan Yang, Yinghao Xu, Yujun Shen,
Hongdong Li, Bolei Zhou
- Abstract summary: We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
- Score: 123.27117057804732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models make huge progress to the photorealistic image synthesis in
recent years. To enable human to steer the image generation process and
customize the output, many works explore the interpretable dimensions of the
latent space in GANs. Existing methods edit the attributes of the output image
such as orientation or color scheme by varying the latent code along certain
directions. However, these methods usually require additional human annotations
for each pretrained model, and they mostly focus on editing global attributes.
In this work, we propose a self-supervised approach to improve the spatial
steerability of GANs without searching for steerable directions in the latent
space or requiring extra annotations. Specifically, we design randomly sampled
Gaussian heatmaps to be encoded into the intermediate layers of generative
models as spatial inductive bias. Along with training the GAN model from
scratch, these heatmaps are being aligned with the emerging attention of the
GAN's discriminator in a self-supervised learning manner. During inference,
users can interact with the spatial heatmaps in an intuitive manner, enabling
them to edit the output image by adjusting the scene layout, moving, or
removing objects. Moreover, we incorporate DragGAN into our framework, which
facilitates fine-grained manipulation within a reasonable time and supports a
coarse-to-fine editing process. Extensive experiments show that the proposed
method not only enables spatial editing over human faces, animal faces, outdoor
scenes, and complicated multi-object indoor scenes but also brings improvement
in synthesis quality. Code, models, and demo video are available at
https://genforce.github.io/SpatialGAN/.
Related papers
- Move Anything with Layered Scene Diffusion [77.45870343845492]
We propose SceneDiffusion to optimize a layered scene representation during the diffusion sampling process.
Our key insight is that spatial disentanglement can be obtained by jointly denoising scene renderings at different spatial layouts.
Our generated scenes support a wide range of spatial editing operations, including moving, resizing, cloning, and layer-wise appearance editing operations.
arXiv Detail & Related papers (2024-04-10T17:28:16Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - High-fidelity GAN Inversion with Padding Space [38.9258619444968]
Inverting a Generative Adversarial Network (GAN) facilitates a wide range of image editing tasks using pre-trained generators.
Existing methods typically employ the latent space of GANs as the inversion space yet observe the insufficient recovery of spatial details.
We propose to involve the padding space of the generator to complement the latent space with spatial information.
arXiv Detail & Related papers (2022-03-21T16:32:12Z) - InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model.
This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z) - Mask-Guided Discovery of Semantic Manifolds in Generative Models [0.0]
StyleGAN2 generates images of human faces from random vectors in a lower-dimensional latent space.
The model behaves as a black box, providing neither control over its output nor insight into the structures it has learned from the data.
We present a method to explore the manifold of changes of spatially localized regions of the face.
arXiv Detail & Related papers (2021-05-15T18:06:38Z) - Navigating the GAN Parameter Space for Semantic Image Editing [35.622710993417456]
Generative Adversarial Networks (GANs) are an indispensable tool for visual editing.
In this paper, we significantly expand the range of visual effects achievable with the state-of-the-art models, like StyleGAN2.
arXiv Detail & Related papers (2020-11-27T15:38:56Z) - Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications.
We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z) - InterFaceGAN: Interpreting the Disentangled Face Representation Learned
by GANs [73.27299786083424]
We propose a framework called InterFaceGAN to interpret the disentangled face representation learned by state-of-the-art GAN models.
We first find that GANs learn various semantics in some linear subspaces of the latent space.
We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection.
arXiv Detail & Related papers (2020-05-18T18:01:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.