Robust Text-driven Image Editing Method that Adaptively Explores
Directions in Latent Spaces of StyleGAN and CLIP
- URL: http://arxiv.org/abs/2304.00964v1
- Date: Mon, 3 Apr 2023 13:30:48 GMT
- Title: Robust Text-driven Image Editing Method that Adaptively Explores
Directions in Latent Spaces of StyleGAN and CLIP
- Authors: Tsuyoshi Baba, Kosuke Nishida, Kyosuke Nishida
- Abstract summary: A pioneering work in text-driven image editing, StyleCLIP, finds an edit direction in the CLIP space and then edits the image by mapping the direction to the StyleGAN space.
At the same time, it is difficult to tune appropriate inputs other than the original image and text instructions for image editing.
We propose a method to construct the edit direction adaptively in the StyleGAN and CLIP spaces with SVM.
- Score: 10.187432367590201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic image editing has great demands because of its numerous
applications, and the use of natural language instructions is essential to
achieving flexible and intuitive editing as the user imagines. A pioneering
work in text-driven image editing, StyleCLIP, finds an edit direction in the
CLIP space and then edits the image by mapping the direction to the StyleGAN
space. At the same time, it is difficult to tune appropriate inputs other than
the original image and text instructions for image editing. In this study, we
propose a method to construct the edit direction adaptively in the StyleGAN and
CLIP spaces with SVM. Our model represents the edit direction as a normal
vector in the CLIP space obtained by training a SVM to classify positive and
negative images. The images are retrieved from a large-scale image corpus,
originally used for pre-training StyleGAN, according to the CLIP similarity
between the images and the text instruction. We confirmed that our model
performed as well as the StyleCLIP baseline, whereas it allows simple inputs
without increasing the computational time.
Related papers
- Text-Driven Image Editing via Learnable Regions [74.45313434129005]
We introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches.
We show that this simple approach enables flexible editing that is compatible with current image generation models.
Experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that correspond to the provided language descriptions.
arXiv Detail & Related papers (2023-11-28T02:27:31Z) - CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing [22.40686064568406]
We present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes.
Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds.
arXiv Detail & Related papers (2023-07-17T11:29:48Z) - TD-GEM: Text-Driven Garment Editing Mapper [15.121103742607383]
We propose a Text-Driven Garment Editing Mapper (TD-GEM) to edit fashion items in a disentangled way.
An optimization-based Contrastive Language-Image Pre-training is then utilized to guide the latent representation of a fashion image.
Our TD-GEM manipulates the image accurately according to the target attribute expressed in terms of a text prompt.
arXiv Detail & Related papers (2023-05-29T14:31:54Z) - Towards Arbitrary Text-driven Image Manipulation via Space Alignment [49.3370305074319]
We propose a new Text-driven image Manipulation framework via Space Alignment (TMSA)
TMSA aims to align the same semantic regions in CLIP and StyleGAN spaces.
The framework can support arbitrary image editing mode without additional cost.
arXiv Detail & Related papers (2023-01-25T16:20:01Z) - CLIP2GAN: Towards Bridging Text with the Latent Space of GANs [128.47600914674985]
We propose a novel framework, i.e., CLIP2GAN, by leveraging CLIP model and StyleGAN.
The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN.
arXiv Detail & Related papers (2022-11-28T04:07:17Z) - Bridging CLIP and StyleGAN through Latent Alignment for Image Editing [33.86698044813281]
We bridge CLIP and StyleGAN to achieve inference-time optimization-free diverse manipulation direction mining.
With this mapping scheme, we can achieve GAN inversion, text-to-image generation and text-driven image manipulation.
arXiv Detail & Related papers (2022-10-10T09:17:35Z) - FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.
First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space.
We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z) - SpaceEdit: Learning a Unified Editing Space for Open-Domain Image
Editing [94.31103255204933]
We propose a unified model for open-domain image editing focusing on color and tone adjustment of open-domain images.
Our model learns a unified editing space that is more semantic, intuitive, and easy to manipulate.
We show that by inverting image pairs into latent codes of the learned editing space, our model can be leveraged for various downstream editing tasks.
arXiv Detail & Related papers (2021-11-30T23:53:32Z) - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation.
We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.
Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.