Semantic Unfolding of StyleGAN Latent Space
- URL: http://arxiv.org/abs/2206.14892v1
- Date: Wed, 29 Jun 2022 20:22:10 GMT
- Title: Semantic Unfolding of StyleGAN Latent Space
- Authors: Mustafa Shukor, Xu Yao, Bharath Bushan Damodaran, Pierre Hellier
- Abstract summary: Generative adversarial networks (GANs) have proven to be surprisingly efficient for image editing by inverting and manipulating the latent code corresponding to an input real image.
This editing property emerges from the disentangled nature of the latent space.
In this paper, we identify that the facial attribute disentanglement is not optimal, thus facial editing relying on linear attribute separation is flawed.
- Score: 0.7646713951724012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative adversarial networks (GANs) have proven to be surprisingly
efficient for image editing by inverting and manipulating the latent code
corresponding to an input real image. This editing property emerges from the
disentangled nature of the latent space. In this paper, we identify that the
facial attribute disentanglement is not optimal, thus facial editing relying on
linear attribute separation is flawed. We thus propose to improve semantic
disentanglement with supervision. Our method consists in learning a proxy
latent representation using normalizing flows, and we show that this leads to a
more efficient space for face image editing.
Related papers
- Editable Image Elements for Controllable Synthesis [79.58148778509769]
We propose an image representation that promotes spatial editing of input images using a diffusion model.
We show the effectiveness of our representation on various image editing tasks, such as object resizing, rearrangement, dragging, de-occlusion, removal, variation, and image composition.
arXiv Detail & Related papers (2024-04-24T17:59:11Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - Face Attribute Editing with Disentangled Latent Vectors [0.0]
We propose an image-to-image translation framework for facial attribute editing.
Inspired by the latent space factorization works of fixed pretrained GANs, we design the attribute editing by latent space factorization.
To project images to semantically organized latent spaces, we set an encoder-decoder architecture with attention-based skip connections.
arXiv Detail & Related papers (2023-01-11T18:32:13Z) - Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm.
With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images.
In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z) - Expanding the Latent Space of StyleGAN for Real Face Editing [4.1715767752637145]
A surge of face editing techniques have been proposed to employ the pretrained StyleGAN for semantic manipulation.
To successfully edit a real image, one must first convert the input image into StyleGAN's latent variables.
We present a method to expand the latent space of StyleGAN with additional content features to break down the trade-off between low-distortion and high-editability.
arXiv Detail & Related papers (2022-04-26T18:27:53Z) - Semantic and Geometric Unfolding of StyleGAN Latent Space [2.7910505923792646]
Generative adversarial networks (GANs) have proven to be surprisingly efficient for image editing.
In this paper, we identify two geometric limitations of such latent space.
We propose a new method to learn a proxy latent representation using normalizing flows to remedy these limitations.
arXiv Detail & Related papers (2021-07-09T15:12:55Z) - Pivotal Tuning for Latent-based Editing of Real Images [40.22151052441958]
A surge of advanced facial editing techniques have been proposed that leverage the generative power of a pre-trained StyleGAN.
To successfully edit an image this way, one must first project (or invert) the image into the pre-trained generator's domain.
This means it is still challenging to apply ID-preserving facial latent-space editing to faces which are out of the generator's domain.
arXiv Detail & Related papers (2021-06-10T13:47:59Z) - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z) - PIE: Portrait Image Embedding for Semantic Control [82.69061225574774]
We present the first approach for embedding real portrait images in the latent space of StyleGAN.
We use StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN.
An identity energy preservation term allows spatially coherent edits while maintaining facial integrity.
arXiv Detail & Related papers (2020-09-20T17:53:51Z) - In-Domain GAN Inversion for Real Image Editing [56.924323432048304]
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.
Existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space.
We propose an in-domain GAN inversion approach, which faithfully reconstructs the input image and ensures the inverted code to be semantically meaningful for editing.
arXiv Detail & Related papers (2020-03-31T18:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.