Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion
Image Manipulation
- URL: http://arxiv.org/abs/2210.05872v1
- Date: Wed, 12 Oct 2022 02:21:18 GMT
- Title: Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion
Image Manipulation
- Authors: Chaerin Kong, DongHyeon Jeon, Ohjoon Kwon, Nojun Kwak
- Abstract summary: Fashion attribute editing is a task that aims to convert the semantic attributes of a given fashion image while preserving the irrelevant regions.
Previous works typically employ conditional GANs where the generator explicitly learns the target attributes and directly execute the conversion.
We explore the classifier-guided diffusion that leverages the off-the-shelf diffusion model pretrained on general visual semantics such as Imagenet.
- Score: 27.587905673112473
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Fashion attribute editing is a task that aims to convert the semantic
attributes of a given fashion image while preserving the irrelevant regions.
Previous works typically employ conditional GANs where the generator explicitly
learns the target attributes and directly execute the conversion. These
approaches, however, are neither scalable nor generic as they operate only with
few limited attributes and a separate generator is required for each dataset or
attribute set. Inspired by the recent advancement of diffusion models, we
explore the classifier-guided diffusion that leverages the off-the-shelf
diffusion model pretrained on general visual semantics such as Imagenet. In
order to achieve a generic editing pipeline, we pose this as multi-attribute
image manipulation task, where the attribute ranges from item category, fabric,
pattern to collar and neckline. We empirically show that conventional methods
fail in our challenging setting, and study efficient adaptation scheme that
involves recently introduced attention-pooling technique to obtain a
multi-attribute classifier guidance. Based on this, we present a mask-free
fashion attribute editing framework that leverages the classifier logits and
the cross-attention map for manipulation. We empirically demonstrate that our
framework achieves convincing sample quality and attribute alignments.
Related papers
- Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search [19.610244285078483]
We propose an Attribute-Aware Implicit Modality Alignment (AIMA) framework to learn the correspondence of local representations between textual attributes and images.
We show that our proposed method significantly surpasses the current state-of-the-art methods.
arXiv Detail & Related papers (2024-06-06T03:34:42Z) - Exploring Attribute Variations in Style-based GANs using Diffusion
Models [48.98081892627042]
We formulate the task of textitdiverse attribute editing by modeling the multidimensional nature of attribute edits.
We capitalize on disentangled latent spaces of pretrained GANs and train a Denoising Diffusion Probabilistic Model (DDPM) to learn the latent distribution for diverse edits.
arXiv Detail & Related papers (2023-11-27T18:14:03Z) - Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale
Fine-Grained Image Retrieval [65.43522019468976]
We propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes.
We develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors.
Our models are equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities.
arXiv Detail & Related papers (2023-11-21T08:20:38Z) - ManiCLIP: Multi-Attribute Face Manipulation from Text [104.30600573306991]
We present a novel multi-attribute face manipulation method based on textual descriptions.
Our method generates natural manipulated faces with minimal text-irrelevant attribute editing.
arXiv Detail & Related papers (2022-10-02T07:22:55Z) - Everything is There in Latent Space: Attribute Editing and Attribute
Style Manipulation by StyleGAN Latent Space Exploration [39.18239951479647]
We present Few-shot Latent-based Attribute Manipulation and Editing (FLAME)
FLAME is a framework to perform highly controlled image editing by latent space manipulation.
We generate diverse attribute styles in disentangled manner.
arXiv Detail & Related papers (2022-07-20T12:40:32Z) - Attribute Group Editing for Reliable Few-shot Image Generation [85.52840521454411]
We propose a new editing-based method, i.e., Attribute Group Editing (AGE), for few-shot image generation.
AGE examines the internal representation learned in GANs and identifies semantically meaningful directions.
arXiv Detail & Related papers (2022-03-16T06:54:09Z) - SMILE: Semantically-guided Multi-attribute Image and Layout Editing [154.69452301122175]
Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs)
We present a multimodal representation that handles all attributes, be it guided by random noise or images, while only using the underlying domain information of the target domain.
Our method is capable of adding, removing or changing either fine-grained or coarse attributes by using an image as a reference or by exploring the style distribution space.
arXiv Detail & Related papers (2020-10-05T20:15:21Z) - Diverse Image Generation via Self-Conditioned GANs [56.91974064348137]
We train a class-conditional GAN model without using manually annotated class labels.
Instead, our model is conditional on labels automatically derived from clustering in the discriminator's feature space.
Our clustering step automatically discovers diverse modes, and explicitly requires the generator to cover them.
arXiv Detail & Related papers (2020-06-18T17:56:03Z) - MulGAN: Facial Attribute Editing by Exemplar [2.272764591035106]
Methods encode attribute-related information in images into the predefined region of the latent feature space by employing a pair of images with opposite attributes as input to train model.
They suffer from three limitations: (1) the model must be trained using a pair of images with opposite attributes as input; (2) weak capability of editing multiple attributes by exemplars; and (3) poor quality of generating image.
arXiv Detail & Related papers (2019-12-28T04:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.