Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation
- URL: http://arxiv.org/abs/2102.01187v2
- Date: Wed, 3 Feb 2021 07:21:18 GMT
- Title: Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation
- Authors: Peiye Zhuang, Oluwasanmi Koyejo, Alexander G. Schwing
- Abstract summary: Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
- Score: 136.53288628437355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Controllable semantic image editing enables a user to change entire image
attributes with few clicks, e.g., gradually making a summer scene look like it
was taken in winter. Classic approaches for this task use a Generative
Adversarial Net (GAN) to learn a latent space and suitable latent-space
transformations. However, current approaches often suffer from attribute edits
that are entangled, global image identity changes, and diminished
photo-realism. To address these concerns, we learn multiple attribute
transformations simultaneously, we integrate attribute regression into the
training of transformation functions, apply a content loss and an adversarial
loss that encourage the maintenance of image identity and photo-realism. We
propose quantitative evaluation strategies for measuring controllable editing
performance, unlike prior work which primarily focuses on qualitative
evaluation. Our model permits better control for both single- and
multiple-attribute editing, while also preserving image identity and realism
during transformation. We provide empirical results for both real and synthetic
images, highlighting that our model achieves state-of-the-art performance for
targeted image manipulation.
Related papers
- Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.
Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)
We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.
Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - A Compact and Semantic Latent Space for Disentangled and Controllable
Image Editing [4.8201607588546]
We propose an auto-encoder which re-organizes the latent space of StyleGAN, so that each attribute which we wish to edit corresponds to an axis of the new latent space.
We show that our approach has greater disentanglement than competing methods, while maintaining fidelity to the original image with respect to identity.
arXiv Detail & Related papers (2023-12-13T16:18:45Z) - VecGAN: Image-to-Image Translation with Interpretable Latent Directions [4.7590051176368915]
VecGAN is an image-to-image translation framework for facial attribute editing with interpretable latent directions.
VecGAN achieves significant improvements over state-of-the-arts for both local and global edits.
arXiv Detail & Related papers (2022-07-07T16:31:05Z) - End-to-End Visual Editing with a Generatively Pre-Trained Artist [78.5922562526874]
We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change.
We propose a self-supervised approach that simulates edits by augmenting off-the-shelf images in a target domain.
We show that different blending effects can be learned by an intuitive control of the augmentation process, with no other changes required to the model architecture.
arXiv Detail & Related papers (2022-05-03T17:59:30Z) - Expanding the Latent Space of StyleGAN for Real Face Editing [4.1715767752637145]
A surge of face editing techniques have been proposed to employ the pretrained StyleGAN for semantic manipulation.
To successfully edit a real image, one must first convert the input image into StyleGAN's latent variables.
We present a method to expand the latent space of StyleGAN with additional content features to break down the trade-off between low-distortion and high-editability.
arXiv Detail & Related papers (2022-04-26T18:27:53Z) - One-shot domain adaptation for semantic face editing of real world
images using StyleALAE [7.541747299649292]
styleALAE is a latent-space based autoencoder that can generate photo-realistic images of high quality.
Our work ensures that the identity of the reconstructed image is the same as the given input image.
We further generate semantic modifications over the reconstructed image by using the latent space of the pre-trained styleALAE model.
arXiv Detail & Related papers (2021-08-31T14:32:18Z) - PIE: Portrait Image Embedding for Semantic Control [82.69061225574774]
We present the first approach for embedding real portrait images in the latent space of StyleGAN.
We use StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN.
An identity energy preservation term allows spatially coherent edits while maintaining facial integrity.
arXiv Detail & Related papers (2020-09-20T17:53:51Z) - Look here! A parametric learning based approach to redirect visual
attention [49.609412873346386]
We introduce an automatic method to make an image region more attention-capturing via subtle image edits.
Our model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions.
Our edits enable inference at interactive rates on any image size, and easily generalize to videos.
arXiv Detail & Related papers (2020-08-12T16:08:36Z) - Semantic Photo Manipulation with a Generative Image Prior [86.01714863596347]
GANs are able to synthesize images conditioned on inputs such as user sketch, text, or semantic labels.
It is hard for GANs to precisely reproduce an input image.
In this paper, we address these issues by adapting the image prior learned by GANs to image statistics of an individual image.
Our method can accurately reconstruct the input image and synthesize new content, consistent with the appearance of the input image.
arXiv Detail & Related papers (2020-05-15T18:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.