VecGAN: Image-to-Image Translation with Interpretable Latent Directions
- URL: http://arxiv.org/abs/2207.03411v1
- Date: Thu, 7 Jul 2022 16:31:05 GMT
- Title: VecGAN: Image-to-Image Translation with Interpretable Latent Directions
- Authors: Yusuf Dalva, Said Fahri Altindis, Aysegul Dundar
- Abstract summary: VecGAN is an image-to-image translation framework for facial attribute editing with interpretable latent directions.
VecGAN achieves significant improvements over state-of-the-arts for both local and global edits.
- Score: 4.7590051176368915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose VecGAN, an image-to-image translation framework for facial
attribute editing with interpretable latent directions. Facial attribute
editing task faces the challenges of precise attribute editing with
controllable strength and preservation of the other attributes of an image. For
this goal, we design the attribute editing by latent space factorization and
for each attribute, we learn a linear direction that is orthogonal to the
others. The other component is the controllable strength of the change, a
scalar value. In our framework, this scalar can be either sampled or encoded
from a reference image by projection. Our work is inspired by the latent space
factorization works of fixed pretrained GANs. However, while those models
cannot be trained end-to-end and struggle to edit encoded images precisely,
VecGAN is end-to-end trained for image translation task and successful at
editing an attribute while preserving the others. Our extensive experiments
show that VecGAN achieves significant improvements over state-of-the-arts for
both local and global edits.
Related papers
- A Compact and Semantic Latent Space for Disentangled and Controllable
Image Editing [4.8201607588546]
We propose an auto-encoder which re-organizes the latent space of StyleGAN, so that each attribute which we wish to edit corresponds to an axis of the new latent space.
We show that our approach has greater disentanglement than competing methods, while maintaining fidelity to the original image with respect to identity.
arXiv Detail & Related papers (2023-12-13T16:18:45Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Face Attribute Editing with Disentangled Latent Vectors [0.0]
We propose an image-to-image translation framework for facial attribute editing.
Inspired by the latent space factorization works of fixed pretrained GANs, we design the attribute editing by latent space factorization.
To project images to semantically organized latent spaces, we set an encoder-decoder architecture with attention-based skip connections.
arXiv Detail & Related papers (2023-01-11T18:32:13Z) - Towards Counterfactual Image Manipulation via CLIP [106.94502632502194]
Existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images.
We investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP)
We design a novel contrastive loss that exploits predefined CLIP-space directions to guide the editing toward desired directions from different perspectives.
arXiv Detail & Related papers (2022-07-06T17:02:25Z) - SpaceEdit: Learning a Unified Editing Space for Open-Domain Image
Editing [94.31103255204933]
We propose a unified model for open-domain image editing focusing on color and tone adjustment of open-domain images.
Our model learns a unified editing space that is more semantic, intuitive, and easy to manipulate.
We show that by inverting image pairs into latent codes of the learned editing space, our model can be leveraged for various downstream editing tasks.
arXiv Detail & Related papers (2021-11-30T23:53:32Z) - Designing an Encoder for StyleGAN Image Manipulation [38.909059126878354]
We study the latent space of StyleGAN, the state-of-the-art unconditional generator.
We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space.
We present an encoder based on our two principles that is specifically designed for facilitating editing on real images.
arXiv Detail & Related papers (2021-02-04T17:52:38Z) - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z) - Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications.
We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z) - Towards Disentangling Latent Space for Unsupervised Semantic Face
Editing [21.190437168936764]
Supervised attribute editing requires annotated training data which is difficult to obtain and limits the editable attributes to those with labels.
In this paper, we present a new technique termed Structure-Texture Independent Architecture with Weight Decomposition and Orthogonal Regularization (STIA-WO) to disentangle the latent space for unsupervised semantic face editing.
arXiv Detail & Related papers (2020-11-05T03:29:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.