TD-GEM: Text-Driven Garment Editing Mapper
- URL: http://arxiv.org/abs/2305.18120v2
- Date: Wed, 26 Jul 2023 09:19:29 GMT
- Title: TD-GEM: Text-Driven Garment Editing Mapper
- Authors: Reza Dadfar, Sanaz Sabzevari, M\r{a}rten Bj\"orkman, Danica Kragic
- Abstract summary: We propose a Text-Driven Garment Editing Mapper (TD-GEM) to edit fashion items in a disentangled way.
An optimization-based Contrastive Language-Image Pre-training is then utilized to guide the latent representation of a fashion image.
Our TD-GEM manipulates the image accurately according to the target attribute expressed in terms of a text prompt.
- Score: 15.121103742607383
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Language-based fashion image editing allows users to try out variations of
desired garments through provided text prompts. Inspired by research on
manipulating latent representations in StyleCLIP and HairCLIP, we focus on
these latent spaces for editing fashion items of full-body human datasets.
Currently, there is a gap in handling fashion image editing due to the
complexity of garment shapes and textures and the diversity of human poses. In
this paper, we propose an editing optimizer scheme method called Text-Driven
Garment Editing Mapper (TD-GEM), aiming to edit fashion items in a disentangled
way. To this end, we initially obtain a latent representation of an image
through generative adversarial network inversions such as Encoder for Editing
(e4e) or Pivotal Tuning Inversion (PTI) for more accurate results. An
optimization-based Contrastive Language-Image Pre-training (CLIP) is then
utilized to guide the latent representation of a fashion image in the direction
of a target attribute expressed in terms of a text prompt. Our TD-GEM
manipulates the image accurately according to the target attribute, while other
parts of the image are kept untouched. In the experiments, we evaluate TD-GEM
on two different attributes (i.e., "color" and "sleeve length"), which
effectively generates realistic images compared to the recent manipulation
schemes.
Related papers
- CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing [22.40686064568406]
We present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes.
Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds.
arXiv Detail & Related papers (2023-07-17T11:29:48Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion
Models [6.34777393532937]
We propose an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing.
Our proposed editing method consists of a reconstruction stage and an editing stage.
Experiments on ImageNet demonstrate the superior editing performance of our method compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-08T03:34:33Z) - FICE: Text-Conditioned Fashion Image Editing With Guided GAN Inversion [16.583537785874604]
We propose a novel text-conditioned editing model, called FICE, capable of handling a wide variety of diverse text descriptions.
FICE generates highly realistic fashion images and leads to stronger editing performance than existing competing approaches.
arXiv Detail & Related papers (2023-01-05T15:33:23Z) - Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
Inpainting [53.708523312636096]
We present Imagen Editor, a cascaded diffusion model built, by fine-tuning on text-guided image inpainting.
edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training.
To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting.
arXiv Detail & Related papers (2022-12-13T21:25:11Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z) - LDEdit: Towards Generalized Text Guided Image Manipulation via Latent
Diffusion Models [12.06277444740134]
generic image manipulation using a single model with flexible text inputs is highly desirable.
Recent work addresses this task by guiding generative models trained on the generic image using pretrained vision-language encoders.
We propose an optimization-free method for the task of generic image manipulation from text prompts.
arXiv Detail & Related papers (2022-10-05T13:26:15Z) - FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.
First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space.
We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z) - EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing.
We show that EditGAN can manipulate images with an unprecedented level of detail and freedom.
We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z) - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation.
We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.
Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.