Related papers: Magic Clothing: Controllable Garment-Driven Image Synthesis

Magic Clothing: Controllable Garment-Driven Image Synthesis

URL: http://arxiv.org/abs/2404.09512v2
Date: Wed, 24 Jul 2024 04:06:12 GMT
Title: Magic Clothing: Controllable Garment-Driven Image Synthesis
Authors: Weifeng Chen, Tao Gu, Yuhao Xu, Chengcai Chen,
Abstract summary: We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task. Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllability is the most critical issue. We introduce a garment extractor to capture the detailed garment features, and employ self-attention fusion to incorporate them into the pretrained LDMs.
Score: 7.46772222515689
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task. Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllability is the most critical issue, i.e., to preserve the garment details and maintain faithfulness to the text prompts. To this end, we introduce a garment extractor to capture the detailed garment features, and employ self-attention fusion to incorporate them into the pretrained LDMs, ensuring that the garment details remain unchanged on the target character. Then, we leverage the joint classifier-free guidance to balance the control of garment features and text prompts over the generated results. Meanwhile, the proposed garment extractor is a plug-in module applicable to various finetuned LDMs, and it can be combined with other extensions like ControlNet and IP-Adapter to enhance the diversity and controllability of the generated characters. Furthermore, we design Matched-Points-LPIPS (MP-LPIPS), a robust metric for evaluating the consistency of the target image to the source garment. Extensive experiments demonstrate that our Magic Clothing achieves state-of-the-art results under various conditional controls for garment-driven image synthesis. Our source code is available at https://github.com/ShineChen1024/MagicClothing.

Related papers

Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals [76.96387718150542]
We present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF)<n>Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting.<n> Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task.
arXiv Detail & Related papers (2025-05-27T11:47:51Z)
Fine-Grained Controllable Apparel Showcase Image Generation via Garment-Centric Outpainting [39.50293003775675]
We propose a novel garment-centric outpainting (GCO) framework based on the latent diffusion model (LDM) The proposed framework aims at customizing a fashion model wearing a given garment via text prompts and facial images.
arXiv Detail & Related papers (2025-03-03T08:30:37Z)
ChatGarment: Garment Estimation, Generation and Editing via Large Language Models [79.46056192947924]
ChatGarment is a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garments. It can estimate sewing patterns from in-the-wild images or sketches, generate them from text descriptions, and edit garments based on user instructions.
arXiv Detail & Related papers (2024-12-23T18:59:28Z)
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models [7.534556848810697]
We propose a novel AnyDressing method for customizing characters conditioned on any combination of garments and personalized text prompts. AnyDressing comprises two primary networks named GarmentsNet and DressingNet, which are respectively dedicated to extracting detailed clothing features. We introduce a Garment-Enhanced Texture Learning strategy to improve the fine-grained texture details of garments.
arXiv Detail & Related papers (2024-12-05T13:16:47Z)
AIpparel: A Multimodal Foundation Model for Digital Garments [71.12933771326279]
We introduce AIpparel, a multimodal foundation model for generating and editing sewing patterns. Our model fine-tunes state-of-the-art large multimodal models on a custom-curated large-scale dataset of over 120,000 unique garments. We propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently.
arXiv Detail & Related papers (2024-12-05T07:35:19Z)
FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on [73.13242624924814]
Garment perception enhancement technique, FitDiT, is designed for high-fidelity virtual try-on using Diffusion Transformers (DiT) We introduce a garment texture extractor that incorporates garment priors evolution to fine-tune garment feature, facilitating to better capture rich details such as stripes, patterns, and text. We also employ a dilated-relaxed mask strategy that adapts to the correct length of garments, preventing the generation of garments that fill the entire mask area during cross-category try-on.
arXiv Detail & Related papers (2024-11-15T11:02:23Z)
Improving Virtual Try-On with Garment-focused Diffusion Models [91.95830983115474]
Diffusion models have led to the revolutionizing of generative modeling in numerous image synthesis tasks. We shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process. Experiments on VITON-HD and DressCode datasets demonstrate the superiority of our GarDiff when compared to state-of-the-art VTON approaches.
arXiv Detail & Related papers (2024-09-12T17:55:11Z)
Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video [66.98046635045685]
We introduce a novel approach for reconstructing realistic simulation-ready garment assets from multi-view videos. Our method represents garments with a combination of a 3D mesh and a Gaussian texture that encodes both the color and high-frequency surface details. This representation enables accurate registration of garment geometries to multi-view videos and helps disentangle albedo textures from lighting effects.
arXiv Detail & Related papers (2024-09-12T16:26:47Z)
Multi-Garment Customized Model Generation [3.1679243514285194]
Multi-Garment Customized Model Generation is a unified framework based on Latent Diffusion Models (LDMs) Our framework supports the conditional generation of multiple garments through decoupled multi-garment feature fusion. The proposed garment encoder is a plug-and-play module that can be combined with other extension modules.
arXiv Detail & Related papers (2024-08-09T17:57:33Z)
IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions. IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z)
CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification [47.948622774810296]
We propose a novel framework called CLIP-Driven Cloth-Agnostic Feature Learning (CCAF) for Cloth-Changing Person Re-Identification (CC-ReID) Two modules were custom-designed: the Invariant Feature Prompting (IFP) and the Clothes Feature Minimization (CFM) Experiments have demonstrated the effectiveness of the proposed CCAF, achieving new state-of-the-art performance on several popular CC-ReID benchmarks without any additional inference time.
arXiv Detail & Related papers (2024-06-13T14:56:07Z)
StableGarment: Garment-Centric Generation via Stable Diffusion [29.5112874761836]
We introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks. Our solution involves the development of a garment encoder, a trainable copy of the denoising UNet equipped with additive self-attention layers. The incorporation of a dedicated try-on ControlNet enables StableGarment to execute virtual try-on tasks with precision.
arXiv Detail & Related papers (2024-03-16T03:05:07Z)
TD-GEM: Text-Driven Garment Editing Mapper [15.121103742607383]
We propose a Text-Driven Garment Editing Mapper (TD-GEM) to edit fashion items in a disentangled way. An optimization-based Contrastive Language-Image Pre-training is then utilized to guide the latent representation of a fashion image. Our TD-GEM manipulates the image accurately according to the target attribute expressed in terms of a text prompt.
arXiv Detail & Related papers (2023-05-29T14:31:54Z)
Arbitrary Virtual Try-On Network: Characteristics Preservation and Trade-off between Body and Clothing [85.74977256940855]
We propose an Arbitrary Virtual Try-On Network (AVTON) for all-type clothes. AVTON can synthesize realistic try-on images by preserving and trading off characteristics of the target clothes and the reference person. Our approach can achieve better performance compared with the state-of-the-art virtual try-on methods.
arXiv Detail & Related papers (2021-11-24T08:59:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.