CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
- URL: http://arxiv.org/abs/2112.05219v1
- Date: Thu, 9 Dec 2021 21:26:03 GMT
- Title: CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
- Authors: Rameen Abdal, Peihao Zhu, John Femiani, Niloy J. Mitra, Peter Wonka
- Abstract summary: StyleGAN has enabled unprecedented semantic editing capabilities, on both synthesized and real images.
We propose two novel building blocks; one for finding interesting CLIP directions and one for labeling arbitrary directions in CLIP latent space.
We evaluate the effectiveness of the proposed method and demonstrate that extraction of disentangled labeled StyleGAN edit directions is indeed possible.
- Score: 65.00528970576401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The success of StyleGAN has enabled unprecedented semantic editing
capabilities, on both synthesized and real images. However, such editing
operations are either trained with semantic supervision or described using
human guidance. In another development, the CLIP architecture has been trained
with internet-scale image and text pairings and has been shown to be useful in
several zero-shot learning settings. In this work, we investigate how to
effectively link the pretrained latent spaces of StyleGAN and CLIP, which in
turn allows us to automatically extract semantically labeled edit directions
from StyleGAN, finding and naming meaningful edit operations without any
additional human guidance. Technically, we propose two novel building blocks;
one for finding interesting CLIP directions and one for labeling arbitrary
directions in CLIP latent space. The setup does not assume any pre-determined
labels and hence we do not require any additional supervised text/attributes to
build the editing framework. We evaluate the effectiveness of the proposed
method and demonstrate that extraction of disentangled labeled StyleGAN edit
directions is indeed possible, and reveals interesting and non-trivial edit
directions.
Related papers
- Editing Arbitrary Propositions in LLMs without Subject Labels [88.67755930096966]
We introduce a simple and fast localization method called Gradient Tracing (GT)
GT allows editing arbitrary propositions instead of just binary ones, and does so without the need for subject labels.
We show that our method -- without access to subject labels -- performs close to state-of-the-art L&E methods which has access subject labels.
arXiv Detail & Related papers (2024-01-15T08:08:24Z) - CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing [22.40686064568406]
We present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes.
Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds.
arXiv Detail & Related papers (2023-07-17T11:29:48Z) - Robust Text-driven Image Editing Method that Adaptively Explores
Directions in Latent Spaces of StyleGAN and CLIP [10.187432367590201]
A pioneering work in text-driven image editing, StyleCLIP, finds an edit direction in the CLIP space and then edits the image by mapping the direction to the StyleGAN space.
At the same time, it is difficult to tune appropriate inputs other than the original image and text instructions for image editing.
We propose a method to construct the edit direction adaptively in the StyleGAN and CLIP spaces with SVM.
arXiv Detail & Related papers (2023-04-03T13:30:48Z) - Discovering Class-Specific GAN Controls for Semantic Image Synthesis [73.91655061467988]
We propose a novel method for finding spatially disentangled class-specific directions in the latent space of pretrained SIS models.
We show that the latent directions found by our method can effectively control the local appearance of semantic classes.
arXiv Detail & Related papers (2022-12-02T21:39:26Z) - CLIP2GAN: Towards Bridging Text with the Latent Space of GANs [128.47600914674985]
We propose a novel framework, i.e., CLIP2GAN, by leveraging CLIP model and StyleGAN.
The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN.
arXiv Detail & Related papers (2022-11-28T04:07:17Z) - $S^2$-Flow: Joint Semantic and Style Editing of Facial Images [16.47093005910139]
generative adversarial networks (GANs) have motivated investigations into their application for image editing.
GANs are often limited in the control they provide for performing specific edits.
We propose a method to disentangle a GAN$text'$s latent space into semantic and style spaces.
arXiv Detail & Related papers (2022-11-22T12:00:02Z) - Towards Counterfactual Image Manipulation via CLIP [106.94502632502194]
Existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images.
We investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP)
We design a novel contrastive loss that exploits predefined CLIP-space directions to guide the editing toward desired directions from different perspectives.
arXiv Detail & Related papers (2022-07-06T17:02:25Z) - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation.
We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.
Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z) - Towards Disentangling Latent Space for Unsupervised Semantic Face
Editing [21.190437168936764]
Supervised attribute editing requires annotated training data which is difficult to obtain and limits the editable attributes to those with labels.
In this paper, we present a new technique termed Structure-Texture Independent Architecture with Weight Decomposition and Orthogonal Regularization (STIA-WO) to disentangle the latent space for unsupervised semantic face editing.
arXiv Detail & Related papers (2020-11-05T03:29:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.