Magic Clothing: Controllable Garment-Driven Image Synthesis
- URL: http://arxiv.org/abs/2404.09512v2
- Date: Wed, 24 Jul 2024 04:06:12 GMT
- Title: Magic Clothing: Controllable Garment-Driven Image Synthesis
- Authors: Weifeng Chen, Tao Gu, Yuhao Xu, Chengcai Chen,
- Abstract summary: We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task.
Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllability is the most critical issue.
We introduce a garment extractor to capture the detailed garment features, and employ self-attention fusion to incorporate them into the pretrained LDMs.
- Score: 7.46772222515689
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task. Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllability is the most critical issue, i.e., to preserve the garment details and maintain faithfulness to the text prompts. To this end, we introduce a garment extractor to capture the detailed garment features, and employ self-attention fusion to incorporate them into the pretrained LDMs, ensuring that the garment details remain unchanged on the target character. Then, we leverage the joint classifier-free guidance to balance the control of garment features and text prompts over the generated results. Meanwhile, the proposed garment extractor is a plug-in module applicable to various finetuned LDMs, and it can be combined with other extensions like ControlNet and IP-Adapter to enhance the diversity and controllability of the generated characters. Furthermore, we design Matched-Points-LPIPS (MP-LPIPS), a robust metric for evaluating the consistency of the target image to the source garment. Extensive experiments demonstrate that our Magic Clothing achieves state-of-the-art results under various conditional controls for garment-driven image synthesis. Our source code is available at https://github.com/ShineChen1024/MagicClothing.
Related papers
- IMAGDressing-v1: Customizable Virtual Dressing [39.78771546133316]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification [47.948622774810296]
We propose a novel framework called CLIP-Driven Cloth-Agnostic Feature Learning (CCAF) for Cloth-Changing Person Re-Identification (CC-ReID)
Two modules were custom-designed: the Invariant Feature Prompting (IFP) and the Clothes Feature Minimization (CFM)
Experiments have demonstrated the effectiveness of the proposed CCAF, achieving new state-of-the-art performance on several popular CC-ReID benchmarks without any additional inference time.
arXiv Detail & Related papers (2024-06-13T14:56:07Z) - MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation [70.83668869857665]
MMTryon is a multi-modal multi-reference VIrtual Try-ON framework.
It can generate high-quality compositional try-on results by taking a text instruction and multiple garment images as inputs.
arXiv Detail & Related papers (2024-05-01T11:04:22Z) - LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation [62.232361821779335]
We introduce a tuning-free attention control framework, encapsulated by the progressive process of prompt-Aware editing, StablE animation geneRation, abbreviated as LASER.
We manipulate the model's spatial features and self-attention mechanisms to maintain animation integrity.
Our meticulous control over spatial features and self-attention ensures structural consistency in the images.
arXiv Detail & Related papers (2024-04-21T07:13:56Z) - StableGarment: Garment-Centric Generation via Stable Diffusion [29.5112874761836]
We introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks.
Our solution involves the development of a garment encoder, a trainable copy of the denoising UNet equipped with additive self-attention layers.
The incorporation of a dedicated try-on ControlNet enables StableGarment to execute virtual try-on tasks with precision.
arXiv Detail & Related papers (2024-03-16T03:05:07Z) - TD-GEM: Text-Driven Garment Editing Mapper [15.121103742607383]
We propose a Text-Driven Garment Editing Mapper (TD-GEM) to edit fashion items in a disentangled way.
An optimization-based Contrastive Language-Image Pre-training is then utilized to guide the latent representation of a fashion image.
Our TD-GEM manipulates the image accurately according to the target attribute expressed in terms of a text prompt.
arXiv Detail & Related papers (2023-05-29T14:31:54Z) - ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal
Fashion Design [66.68194916359309]
Cross-modal fashion image synthesis has emerged as one of the most promising directions in the generation domain.
MaskCLIP decomposes the garments into semantic parts, ensuring fine-grained and semantically accurate alignment between the visual and text information.
ArmANI discretizes an image into uniform tokens based on a learned cross-modal codebook in its first stage and uses a Transformer to model the distribution of image tokens for a real image.
arXiv Detail & Related papers (2022-08-11T03:44:02Z) - Arbitrary Virtual Try-On Network: Characteristics Preservation and
Trade-off between Body and Clothing [85.74977256940855]
We propose an Arbitrary Virtual Try-On Network (AVTON) for all-type clothes.
AVTON can synthesize realistic try-on images by preserving and trading off characteristics of the target clothes and the reference person.
Our approach can achieve better performance compared with the state-of-the-art virtual try-on methods.
arXiv Detail & Related papers (2021-11-24T08:59:56Z) - Per Garment Capture and Synthesis for Real-time Virtual Try-on [15.128477359632262]
Existing image-based works try to synthesize a try-on image from a single image of a target garment.
It is difficult to reproduce the change of wrinkles caused by pose and body size change, as well as pulling and stretching of the garment by hand.
We propose an alternative per garment capture and synthesis workflow to handle such rich interactions by training the model with many systematically captured images.
arXiv Detail & Related papers (2021-09-10T03:49:37Z) - BCNet: Learning Body and Cloth Shape from A Single Image [56.486796244320125]
We propose a layered garment representation on top of SMPL and novelly make the skinning weight of garment independent of the body mesh.
Compared with existing methods, our method can support more garment categories and recover more accurate geometry.
arXiv Detail & Related papers (2020-04-01T03:41:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.