A Compact and Semantic Latent Space for Disentangled and Controllable
Image Editing
- URL: http://arxiv.org/abs/2312.08256v1
- Date: Wed, 13 Dec 2023 16:18:45 GMT
- Title: A Compact and Semantic Latent Space for Disentangled and Controllable
Image Editing
- Authors: Gwilherm Lesn\'e, Yann Gousseau, Sa\"id Ladjal, Alasdair Newson
- Abstract summary: We propose an auto-encoder which re-organizes the latent space of StyleGAN, so that each attribute which we wish to edit corresponds to an axis of the new latent space.
We show that our approach has greater disentanglement than competing methods, while maintaining fidelity to the original image with respect to identity.
- Score: 4.8201607588546
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advances in the field of generative models and in particular
generative adversarial networks (GANs) have lead to substantial progress for
controlled image editing, especially compared with the pre-deep learning era.
Despite their powerful ability to apply realistic modifications to an image,
these methods often lack properties like disentanglement (the capacity to edit
attributes independently). In this paper, we propose an auto-encoder which
re-organizes the latent space of StyleGAN, so that each attribute which we wish
to edit corresponds to an axis of the new latent space, and furthermore that
the latent axes are decorrelated, encouraging disentanglement. We work in a
compressed version of the latent space, using Principal Component Analysis,
meaning that the parameter complexity of our autoencoder is reduced, leading to
short training times ($\sim$ 45 mins). Qualitative and quantitative results
demonstrate the editing capabilities of our approach, with greater
disentanglement than competing methods, while maintaining fidelity to the
original image with respect to identity. Our autoencoder architecture simple
and straightforward, facilitating implementation.
Related papers
- LCM-Lookahead for Encoder-based Text-to-Image Personalization [82.56471486184252]
We explore the potential of using shortcut-mechanisms to guide the personalization of text-to-image models.
We focus on encoder-based personalization approaches, and demonstrate that by tuning them with a lookahead identity loss, we can achieve higher identity fidelity.
arXiv Detail & Related papers (2024-04-04T17:43:06Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Overparameterization Improves StyleGAN Inversion [66.8300251627992]
Existing inversion approaches obtain promising yet imperfect results.
We show that this allows us to obtain near-perfect image reconstruction without the need for encoders.
Our approach also retains editability, which we demonstrate by realistically interpolating between images.
arXiv Detail & Related papers (2022-05-12T18:42:43Z) - High-fidelity GAN Inversion with Padding Space [38.9258619444968]
Inverting a Generative Adversarial Network (GAN) facilitates a wide range of image editing tasks using pre-trained generators.
Existing methods typically employ the latent space of GANs as the inversion space yet observe the insufficient recovery of spatial details.
We propose to involve the padding space of the generator to complement the latent space with spatial information.
arXiv Detail & Related papers (2022-03-21T16:32:12Z) - Delta-GAN-Encoder: Encoding Semantic Changes for Explicit Image Editing,
using Few Synthetic Samples [2.348633570886661]
We propose a novel method for learning to control any desired attribute in a pre-trained GAN's latent space.
We perform Sim2Real learning, relying on minimal samples to achieve an unlimited amount of continuous precise edits.
arXiv Detail & Related papers (2021-11-16T12:42:04Z) - Designing an Encoder for StyleGAN Image Manipulation [38.909059126878354]
We study the latent space of StyleGAN, the state-of-the-art unconditional generator.
We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space.
We present an encoder based on our two principles that is specifically designed for facilitating editing on real images.
arXiv Detail & Related papers (2021-02-04T17:52:38Z) - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z) - PIE: Portrait Image Embedding for Semantic Control [82.69061225574774]
We present the first approach for embedding real portrait images in the latent space of StyleGAN.
We use StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN.
An identity energy preservation term allows spatially coherent edits while maintaining facial integrity.
arXiv Detail & Related papers (2020-09-20T17:53:51Z) - In-Domain GAN Inversion for Real Image Editing [56.924323432048304]
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.
Existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space.
We propose an in-domain GAN inversion approach, which faithfully reconstructs the input image and ensures the inverted code to be semantically meaningful for editing.
arXiv Detail & Related papers (2020-03-31T18:20:18Z) - Toward a Controllable Disentanglement Network [22.968760397814993]
This paper addresses two crucial problems of learning disentangled image representations, namely controlling the degree of disentanglement during image editing, and balancing the disentanglement strength and the reconstruction quality.
By exploring the real-valued space of the soft target representation, we are able to synthesize novel images with the designated properties.
arXiv Detail & Related papers (2020-01-22T16:54:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.