Each Attribute Matters: Contrastive Attention for Sentence-based Image
Editing
- URL: http://arxiv.org/abs/2110.11159v1
- Date: Thu, 21 Oct 2021 14:06:20 GMT
- Title: Each Attribute Matters: Contrastive Attention for Sentence-based Image
Editing
- Authors: Liuqing Zhao, Fan Lyu, Fuyuan Hu, Kaizhu Huang, Fenglei Xu, Linyan Li
- Abstract summary: Sentence-based Image Editing (SIE) aims to deploy natural language to edit an image.
Existing methods can hardly produce accurate editing when the query sentence is with multiple editable attributes.
This paper proposes a novel model called Contrastive Attention Generative Adversarial Network (CA-GAN)
- Score: 13.321782757637303
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sentence-based Image Editing (SIE) aims to deploy natural language to edit an
image. Offering potentials to reduce expensive manual editing, SIE has
attracted much interest recently. However, existing methods can hardly produce
accurate editing and even lead to failures in attribute editing when the query
sentence is with multiple editable attributes. To cope with this problem, by
focusing on enhancing the difference between attributes, this paper proposes a
novel model called Contrastive Attention Generative Adversarial Network
(CA-GAN), which is inspired from contrastive training. Specifically, we first
design a novel contrastive attention module to enlarge the editing difference
between random combinations of attributes which are formed during training. We
then construct an attribute discriminator to ensure effective editing on each
attribute. A series of experiments show that our method can generate very
encouraging results in sentence-based image editing with multiple attributes on
CUB and COCO dataset. Our code is available at
https://github.com/Zlq2021/CA-GAN
Related papers
- An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control [21.624984690721842]
D-Edit is a framework to disentangle the comprehensive image-prompt interaction into several item-prompt interactions.
It is based on pretrained diffusion models with cross-attention layers disentangled and adopts a two-step optimization to build item-prompt associations.
We demonstrate state-of-the-art results in four types of editing operations including image-based, text-based, mask-based editing, and item removal.
arXiv Detail & Related papers (2024-03-07T20:06:29Z) - Exploring Attribute Variations in Style-based GANs using Diffusion
Models [48.98081892627042]
We formulate the task of textitdiverse attribute editing by modeling the multidimensional nature of attribute edits.
We capitalize on disentangled latent spaces of pretrained GANs and train a Denoising Diffusion Probabilistic Model (DDPM) to learn the latent distribution for diverse edits.
arXiv Detail & Related papers (2023-11-27T18:14:03Z) - Localizing and Editing Knowledge in Text-to-Image Generative Models [62.02776252311559]
knowledge about different attributes is not localized in isolated components, but is instead distributed amongst a set of components in the conditional UNet.
We introduce a fast, data-free model editing method Diff-QuickFix which can effectively edit concepts in text-to-image models.
arXiv Detail & Related papers (2023-10-20T17:31:12Z) - LayerDiffusion: Layered Controlled Image Editing with Diffusion Models [5.58892860792971]
LayerDiffusion is a semantic-based layered controlled image editing method.
We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy.
Experimental results demonstrate the effectiveness of our method in generating highly coherent images.
arXiv Detail & Related papers (2023-05-30T01:26:41Z) - PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor [135.17302411419834]
PAIR Diffusion is a generic framework that enables a diffusion model to control the structure and appearance of each object in the image.
We show that having control over the properties of each object in an image leads to comprehensive editing capabilities.
Our framework allows for various object-level editing operations on real images such as reference image-based appearance editing, free-form shape editing, adding objects, and variations.
arXiv Detail & Related papers (2023-03-30T17:13:56Z) - Face Attribute Editing with Disentangled Latent Vectors [0.0]
We propose an image-to-image translation framework for facial attribute editing.
Inspired by the latent space factorization works of fixed pretrained GANs, we design the attribute editing by latent space factorization.
To project images to semantically organized latent spaces, we set an encoder-decoder architecture with attention-based skip connections.
arXiv Detail & Related papers (2023-01-11T18:32:13Z) - ManiCLIP: Multi-Attribute Face Manipulation from Text [104.30600573306991]
We present a novel multi-attribute face manipulation method based on textual descriptions.
Our method generates natural manipulated faces with minimal text-irrelevant attribute editing.
arXiv Detail & Related papers (2022-10-02T07:22:55Z) - HairCLIP: Design Your Hair by Text and Reference Image [100.85116679883724]
This paper proposes a new hair editing interaction mode, which enables manipulating hair attributes individually or jointly.
We encode the image and text conditions in a shared embedding space and propose a unified hair editing framework.
With the carefully designed network structures and loss functions, our framework can perform high-quality hair editing.
arXiv Detail & Related papers (2021-12-09T18:59:58Z) - EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing.
We show that EditGAN can manipulate images with an unprecedented level of detail and freedom.
We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.