Related papers: Each Attribute Matters: Contrastive Attention for Sentence-based Image Editing

Each Attribute Matters: Contrastive Attention for Sentence-based Image Editing

URL: http://arxiv.org/abs/2110.11159v1
Date: Thu, 21 Oct 2021 14:06:20 GMT
Title: Each Attribute Matters: Contrastive Attention for Sentence-based Image Editing
Authors: Liuqing Zhao, Fan Lyu, Fuyuan Hu, Kaizhu Huang, Fenglei Xu, Linyan Li
Abstract summary: Sentence-based Image Editing (SIE) aims to deploy natural language to edit an image. Existing methods can hardly produce accurate editing when the query sentence is with multiple editable attributes. This paper proposes a novel model called Contrastive Attention Generative Adversarial Network (CA-GAN)
Score: 13.321782757637303
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sentence-based Image Editing (SIE) aims to deploy natural language to edit an image. Offering potentials to reduce expensive manual editing, SIE has attracted much interest recently. However, existing methods can hardly produce accurate editing and even lead to failures in attribute editing when the query sentence is with multiple editable attributes. To cope with this problem, by focusing on enhancing the difference between attributes, this paper proposes a novel model called Contrastive Attention Generative Adversarial Network (CA-GAN), which is inspired from contrastive training. Specifically, we first design a novel contrastive attention module to enlarge the editing difference between random combinations of attributes which are formed during training. We then construct an attribute discriminator to ensure effective editing on each attribute. A series of experiments show that our method can generate very encouraging results in sentence-based image editing with multiple attributes on CUB and COCO dataset. Our code is available at https://github.com/Zlq2021/CA-GAN

Related papers

DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting [63.01425442236011]
We present DreamMix, a diffusion-based generative model adept at inserting target objects into scenes at user-specified locations. We propose an Attribute Decoupling Mechanism (ADM) and a Textual Attribute Substitution (TAS) module to improve the diversity and discriminative capability of the text-based attribute guidance.
arXiv Detail & Related papers (2024-11-26T08:44:47Z)
An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control [21.624984690721842]
D-Edit is a framework to disentangle the comprehensive image-prompt interaction into several item-prompt interactions. It is based on pretrained diffusion models with cross-attention layers disentangled and adopts a two-step optimization to build item-prompt associations. We demonstrate state-of-the-art results in four types of editing operations including image-based, text-based, mask-based editing, and item removal.
arXiv Detail & Related papers (2024-03-07T20:06:29Z)
AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing [24.9487669818162]
We propose atemporal guided adaptive editing algorithm AdapEdit, which realizes adaptive image editing. Our approach has a significant advantage in preserving model priors and does not require model training, fine-tuning extra data, or optimization. We present our results over a wide variety of raw images and editing instructions, demonstrating competitive performance and showing it significantly outperforms the previous approaches.
arXiv Detail & Related papers (2023-12-13T09:45:58Z)
Exploring Attribute Variations in Style-based GANs using Diffusion Models [48.98081892627042]
We formulate the task of textitdiverse attribute editing by modeling the multidimensional nature of attribute edits. We capitalize on disentangled latent spaces of pretrained GANs and train a Denoising Diffusion Probabilistic Model (DDPM) to learn the latent distribution for diverse edits.
arXiv Detail & Related papers (2023-11-27T18:14:03Z)
Localizing and Editing Knowledge in Text-to-Image Generative Models [62.02776252311559]
knowledge about different attributes is not localized in isolated components, but is instead distributed amongst a set of components in the conditional UNet. We introduce a fast, data-free model editing method Diff-QuickFix which can effectively edit concepts in text-to-image models.
arXiv Detail & Related papers (2023-10-20T17:31:12Z)
LayerDiffusion: Layered Controlled Image Editing with Diffusion Models [5.58892860792971]
LayerDiffusion is a semantic-based layered controlled image editing method. We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy. Experimental results demonstrate the effectiveness of our method in generating highly coherent images.
arXiv Detail & Related papers (2023-05-30T01:26:41Z)
Face Attribute Editing with Disentangled Latent Vectors [0.0]
We propose an image-to-image translation framework for facial attribute editing. Inspired by the latent space factorization works of fixed pretrained GANs, we design the attribute editing by latent space factorization. To project images to semantically organized latent spaces, we set an encoder-decoder architecture with attention-based skip connections.
arXiv Detail & Related papers (2023-01-11T18:32:13Z)
ManiCLIP: Multi-Attribute Face Manipulation from Text [104.30600573306991]
We present a novel multi-attribute face manipulation method based on textual descriptions. Our method generates natural manipulated faces with minimal text-irrelevant attribute editing.
arXiv Detail & Related papers (2022-10-02T07:22:55Z)
HairCLIP: Design Your Hair by Text and Reference Image [100.85116679883724]
This paper proposes a new hair editing interaction mode, which enables manipulating hair attributes individually or jointly. We encode the image and text conditions in a shared embedding space and propose a unified hair editing framework. With the carefully designed network structures and loss functions, our framework can perform high-quality hair editing.
arXiv Detail & Related papers (2021-12-09T18:59:58Z)
EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing. We show that EditGAN can manipulate images with an unprecedented level of detail and freedom. We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.