IMAGDressing-v1: Customizable Virtual Dressing
- URL: http://arxiv.org/abs/2407.12705v2
- Date: Tue, 6 Aug 2024 13:06:26 GMT
- Title: IMAGDressing-v1: Customizable Virtual Dressing
- Authors: Fei Shen, Xin Jiang, Xin He, Hu Ye, Cong Wang, Xiaoyu Du, Zechao Li, Jinhui Tang,
- Abstract summary: IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
- Score: 58.44155202253754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional faces, poses, and scenes. To address this issue, we define a virtual dressing (VD) task focused on generating freely editable human images with fixed garments and optional conditions. Meanwhile, we design a comprehensive affinity metric index (CAMI) to evaluate the consistency between generated images and reference garments. Then, we propose IMAGDressing-v1, which incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet, ensuring users can control different scenes through text. IMAGDressing-v1 can be combined with other extension plugins, such as ControlNet and IP-Adapter, to enhance the diversity and controllability of generated images. Furthermore, to address the lack of data, we release the interactive garment pairing (IGPair) dataset, containing over 300,000 pairs of clothing and dressed images, and establish a standard pipeline for data assembly. Extensive experiments demonstrate that our IMAGDressing-v1 achieves state-of-the-art human image synthesis performance under various controlled conditions. The code and model will be available at https://github.com/muzishen/IMAGDressing.
Related papers
- ITVTON:Virtual Try-On Diffusion Transformer Model Based on Integrated Image and Text [0.0]
We introduce ITVTON, a method that enhances clothing-character interactions by combining clothing and character images along spatial channels as inputs.
We incorporate integrated textual descriptions from multiple images to boost the realism of the generated visual effects.
In experiments, ITVTON outperforms baseline methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2025-01-28T07:24:15Z) - AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models [7.534556848810697]
We propose a novel AnyDressing method for customizing characters conditioned on any combination of garments and personalized text prompts.
AnyDressing comprises two primary networks named GarmentsNet and DressingNet, which are respectively dedicated to extracting detailed clothing features.
We introduce a Garment-Enhanced Texture Learning strategy to improve the fine-grained texture details of garments.
arXiv Detail & Related papers (2024-12-05T13:16:47Z) - FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on [73.13242624924814]
Garment perception enhancement technique, FitDiT, is designed for high-fidelity virtual try-on using Diffusion Transformers (DiT)
We introduce a garment texture extractor that incorporates garment priors evolution to fine-tune garment feature, facilitating to better capture rich details such as stripes, patterns, and text.
We also employ a dilated-relaxed mask strategy that adapts to the correct length of garments, preventing the generation of garments that fill the entire mask area during cross-category try-on.
arXiv Detail & Related papers (2024-11-15T11:02:23Z) - MV-VTON: Multi-View Virtual Try-On with Diffusion Models [91.71150387151042]
The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing.
Existing methods solely focus on the frontal try-on using the frontal clothing.
We introduce Multi-View Virtual Try-ON (MV-VTON), which aims to reconstruct the dressing results from multiple views using the given clothes.
arXiv Detail & Related papers (2024-04-26T12:27:57Z) - Magic Clothing: Controllable Garment-Driven Image Synthesis [7.46772222515689]
We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task.
Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllability is the most critical issue.
We introduce a garment extractor to capture the detailed garment features, and employ self-attention fusion to incorporate them into the pretrained LDMs.
arXiv Detail & Related papers (2024-04-15T07:15:39Z) - StableGarment: Garment-Centric Generation via Stable Diffusion [29.5112874761836]
We introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks.
Our solution involves the development of a garment encoder, a trainable copy of the denoising UNet equipped with additive self-attention layers.
The incorporation of a dedicated try-on ControlNet enables StableGarment to execute virtual try-on tasks with precision.
arXiv Detail & Related papers (2024-03-16T03:05:07Z) - PASTA-GAN++: A Versatile Framework for High-Resolution Unpaired Virtual
Try-on [70.12285433529998]
PASTA-GAN++ is a versatile system for high-resolution unpaired virtual try-on.
It supports unsupervised training, arbitrary garment categories, and controllable garment editing.
arXiv Detail & Related papers (2022-07-27T11:47:49Z) - Single Stage Virtual Try-on via Deformable Attention Flows [51.70606454288168]
Virtual try-on aims to generate a photo-realistic fitting result given an in-shop garment and a reference person image.
We develop a novel Deformable Attention Flow (DAFlow) which applies the deformable attention scheme to multi-flow estimation.
Our proposed method achieves state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-07-19T10:01:31Z) - Towards Scalable Unpaired Virtual Try-On via Patch-Routed
Spatially-Adaptive GAN [66.3650689395967]
We propose a texture-preserving end-to-end network, the PAtch-routed SpaTially-Adaptive GAN (PASTA-GAN), that facilitates real-world unpaired virtual try-on.
To disentangle the style and spatial information of each garment, PASTA-GAN consists of an innovative patch-routed disentanglement module.
arXiv Detail & Related papers (2021-11-20T08:36:12Z) - Shape Controllable Virtual Try-on for Underwear Models [0.0]
We propose a Shape Controllable Virtual Try-On Network (SC-VTON) to dress clothing for underwear models.
SC-VTON integrates information of model and clothing to generate warped clothing image.
Our method can generate high-resolution results with detailed textures.
arXiv Detail & Related papers (2021-07-28T04:01:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.