DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided
Image Editing
- URL: http://arxiv.org/abs/2310.08785v1
- Date: Thu, 12 Oct 2023 15:43:12 GMT
- Title: DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided
Image Editing
- Authors: Yueming Lyu, Kang Zhao, Bo Peng, Yue Jiang, Yingya Zhang, Jing Dong
- Abstract summary: Text-guided image editing faces significant challenges to training and inference flexibility.
We propose a novel framework called DeltaEdit, which maps the CLIP visual feature differences to the latent space directions of a generative model.
Experiments validate the effectiveness and versatility of DeltaEdit with different generative models.
- Score: 22.354236929932476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-guided image editing faces significant challenges to training and
inference flexibility. Much literature collects large amounts of annotated
image-text pairs to train text-conditioned generative models from scratch,
which is expensive and not efficient. After that, some approaches that leverage
pre-trained vision-language models are put forward to avoid data collection,
but they are also limited by either per text-prompt optimization or
inference-time hyper-parameters tuning. To address these issues, we investigate
and identify a specific space, referred to as CLIP DeltaSpace, where the CLIP
visual feature difference of two images is semantically aligned with the CLIP
textual feature difference of their corresponding text descriptions. Based on
DeltaSpace, we propose a novel framework called DeltaEdit, which maps the CLIP
visual feature differences to the latent space directions of a generative model
during the training phase, and predicts the latent space directions from the
CLIP textual feature differences during the inference phase. And this design
endows DeltaEdit with two advantages: (1) text-free training; (2)
generalization to various text prompts for zero-shot inference. Extensive
experiments validate the effectiveness and versatility of DeltaEdit with
different generative models, including both the GAN model and the diffusion
model, in achieving flexible text-guided image editing. Code is available at
https://github.com/Yueming6568/DeltaEdit.
Related papers
- Selective Vision-Language Subspace Projection for Few-shot CLIP [55.361337202198925]
We introduce a method called Selective Vision-Language Subspace Projection (SSP)
SSP incorporates local image features and utilizes them as a bridge to enhance the alignment between image-text pairs.
Our approach entails only training-free matrix calculations and can be seamlessly integrated into advanced CLIP-based few-shot learning frameworks.
arXiv Detail & Related papers (2024-07-24T03:45:35Z) - Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control [58.37323932401379]
Current diffusion models create images given a text prompt as input but struggle to correctly bind attributes mentioned in the text to the right objects in the image.
We propose focused cross-attention (FCA) that controls the visual attention maps by syntactic constraints found in the input sentence.
We show substantial improvements in T2I generation and especially its attribute-object binding on several datasets.
arXiv Detail & Related papers (2024-04-21T20:26:46Z) - CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing [22.40686064568406]
We present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes.
Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds.
arXiv Detail & Related papers (2023-07-17T11:29:48Z) - P+: Extended Textual Conditioning in Text-to-Image Generation [50.823884280133626]
We introduce an Extended Textual Conditioning space in text-to-image models, referred to as $P+$.
We show that the extended space provides greater disentangling and control over image synthesis.
We further introduce Extended Textual Inversion (XTI), where the images are inverted into $P+$, and represented by per-layer tokens.
arXiv Detail & Related papers (2023-03-16T17:38:15Z) - DeltaEdit: Exploring Text-free Training for Text-Driven Image
Manipulation [86.86227840278137]
We propose a novel framework named textitDeltaEdit to address these problems.
Based on the CLIP delta space, the DeltaEdit network is designed to map the CLIP visual features differences to the editing directions of StyleGAN.
arXiv Detail & Related papers (2023-03-11T02:38:31Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - One Model to Edit Them All: Free-Form Text-Driven Image Manipulation
with Semantic Modulations [75.81725681546071]
Free-Form CLIP aims to establish an automatic latent mapping so that one manipulation model handles free-form text prompts.
For one type of image (e.g., human portrait'), one FFCLIP model can be learned to handle free-form text prompts.
Both visual and numerical results show that FFCLIP effectively produces semantically accurate and visually realistic images.
arXiv Detail & Related papers (2022-10-14T15:06:05Z) - Text to Image Generation with Semantic-Spatial Aware GAN [41.73685713621705]
A text to image generation (T2I) model aims to generate photo-realistic images which are semantically consistent with the text descriptions.
We propose a novel framework Semantic-Spatial Aware GAN, which is trained in an end-to-end fashion so that the text encoder can exploit better text information.
arXiv Detail & Related papers (2021-04-01T15:48:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.