Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary
Instructions
- URL: http://arxiv.org/abs/2008.01576v2
- Date: Wed, 21 Apr 2021 13:31:55 GMT
- Title: Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary
Instructions
- Authors: Xihui Liu, Zhe Lin, Jianming Zhang, Handong Zhao, Quan Tran, Xiaogang
Wang, Hongsheng Li
- Abstract summary: We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions.
Our approach takes advantage of the unified visual-semantic embedding space pretrained on a general image-caption dataset.
We show promising results in manipulating open-vocabulary color, texture, and high-level attributes for various scenarios of open-domain images.
- Score: 66.82547612097194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel algorithm, named Open-Edit, which is the first attempt on
open-domain image manipulation with open-vocabulary instructions. It is a
challenging task considering the large variation of image domains and the lack
of training supervision. Our approach takes advantage of the unified
visual-semantic embedding space pretrained on a general image-caption dataset,
and manipulates the embedded visual features by applying text-guided vector
arithmetic on the image feature maps. A structure-preserving image decoder then
generates the manipulated images from the manipulated feature maps. We further
propose an on-the-fly sample-specific optimization approach with
cycle-consistency constraints to regularize the manipulated images and force
them to preserve details of the source images. Our approach shows promising
results in manipulating open-vocabulary color, texture, and high-level
attributes for various scenarios of open-domain images.
Related papers
- ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models [55.43801602995778]
We present ImPoster, a novel algorithm for generating a target image of a'source' subject performing a 'driving' action.
Our approach is completely unsupervised and does not require any access to additional annotations like keypoints or pose.
arXiv Detail & Related papers (2024-09-24T01:25:19Z) - Tuning-Free Image Customization with Image and Text Guidance [65.9504243633169]
We introduce a tuning-free framework for simultaneous text-image-guided image customization.
Our approach preserves the semantic features of the reference image subject while allowing modification of detailed attributes based on text descriptions.
Our approach outperforms previous methods in both human and quantitative evaluations.
arXiv Detail & Related papers (2024-03-19T11:48:35Z) - Zero-shot Inversion Process for Image Attribute Editing with Diffusion
Models [9.924851219904843]
We propose a framework that injects a fusion of generated visual reference and text guidance into the semantic latent space of a pre-trained diffusion model.
Only using a tiny neural network, the proposed ZIP produces diverse content and attributes under the intuitive control of the text prompt.
Compared to state-of-the-art methods, ZIP produces images of equivalent quality while providing a realistic editing effect.
arXiv Detail & Related papers (2023-08-30T08:40:15Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion
Models [6.34777393532937]
We propose an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing.
Our proposed editing method consists of a reconstruction stage and an editing stage.
Experiments on ImageNet demonstrate the superior editing performance of our method compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-08T03:34:33Z) - LDEdit: Towards Generalized Text Guided Image Manipulation via Latent
Diffusion Models [12.06277444740134]
generic image manipulation using a single model with flexible text inputs is highly desirable.
Recent work addresses this task by guiding generative models trained on the generic image using pretrained vision-language encoders.
We propose an optimization-free method for the task of generic image manipulation from text prompts.
arXiv Detail & Related papers (2022-10-05T13:26:15Z) - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation.
We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.
Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z) - Look here! A parametric learning based approach to redirect visual
attention [49.609412873346386]
We introduce an automatic method to make an image region more attention-capturing via subtle image edits.
Our model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions.
Our edits enable inference at interactive rates on any image size, and easily generalize to videos.
arXiv Detail & Related papers (2020-08-12T16:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.