MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer
- URL: http://arxiv.org/abs/2502.01626v1
- Date: Mon, 03 Feb 2025 18:56:24 GMT
- Title: MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer
- Authors: Le Shen, Yanting Kang, Rong Huang, Zhijie Wang,
- Abstract summary: Garment-to-person virtual try-on (VTON) aims to generate fitting images of a person wearing a reference garment.
To improve ease of use, we propose a Mask-Free framework for Person-to-Person VTON.
Our model excels in both person-to-person and garment-to-person VTON tasks, generating high-fidelity fitting images.
- Score: 5.844515709826269
- License:
- Abstract: The garment-to-person virtual try-on (VTON) task, which aims to generate fitting images of a person wearing a reference garment, has made significant strides. However, obtaining a standard garment is often more challenging than using the garment already worn by the person. To improve ease of use, we propose MFP-VTON, a Mask-Free framework for Person-to-Person VTON. Recognizing the scarcity of person-to-person data, we adapt a garment-to-person model and dataset to construct a specialized dataset for this task. Our approach builds upon a pretrained diffusion transformer, leveraging its strong generative capabilities. During mask-free model fine-tuning, we introduce a Focus Attention loss to emphasize the garment of the reference person and the details outside the garment of the target person. Experimental results demonstrate that our model excels in both person-to-person and garment-to-person VTON tasks, generating high-fidelity fitting images.
Related papers
- Try-On-Adapter: A Simple and Flexible Try-On Paradigm [42.2724473500475]
Image-based virtual try-on, widely used in online shopping, aims to generate images of a naturally dressed person conditioned on certain garments.
Previous methods focus on masking certain parts of the original model's standing image, and then inpainting on masked areas to generate realistic images of the model wearing corresponding reference garments.
We propose Try-On-Adapter (TOA), an outpainting paradigm that differs from the existing inpainting paradigm.
arXiv Detail & Related papers (2024-11-15T13:35:58Z) - FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on [73.13242624924814]
Garment perception enhancement technique, FitDiT, is designed for high-fidelity virtual try-on using Diffusion Transformers (DiT)
We introduce a garment texture extractor that incorporates garment priors evolution to fine-tune garment feature, facilitating to better capture rich details such as stripes, patterns, and text.
We also employ a dilated-relaxed mask strategy that adapts to the correct length of garments, preventing the generation of garments that fill the entire mask area during cross-category try-on.
arXiv Detail & Related papers (2024-11-15T11:02:23Z) - Improving Virtual Try-On with Garment-focused Diffusion Models [91.95830983115474]
Diffusion models have led to the revolutionizing of generative modeling in numerous image synthesis tasks.
We shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process.
Experiments on VITON-HD and DressCode datasets demonstrate the superiority of our GarDiff when compared to state-of-the-art VTON approaches.
arXiv Detail & Related papers (2024-09-12T17:55:11Z) - IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - MV-VTON: Multi-View Virtual Try-On with Diffusion Models [91.71150387151042]
The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing.
Existing methods solely focus on the frontal try-on using the frontal clothing.
We introduce Multi-View Virtual Try-ON (MV-VTON), which aims to reconstruct the dressing results from multiple views using the given clothes.
arXiv Detail & Related papers (2024-04-26T12:27:57Z) - Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On [29.217423805933727]
Diffusion model-based approaches have recently become popular, as they are excellent at image synthesis tasks.
We propose an Texture-Preserving Diffusion (TPD) model for virtual try-on, which enhances the fidelity of the results.
Second, we propose a novel diffusion-based method that predicts a precise inpainting mask based on the person and reference garment images.
arXiv Detail & Related papers (2024-04-01T12:43:22Z) - PFDM: Parser-Free Virtual Try-on via Diffusion Model [28.202996582963184]
We propose a free virtual try-on method based on the diffusion model (PFDM)
Given two images, PFDM can "wear" garments on the target person seamlessly by implicitly warping without any other information.
Experiments demonstrate that our proposed PFDM can successfully handle complex images, and outperform both state-of-the-art-free and high-fidelity-based models.
arXiv Detail & Related papers (2024-02-05T14:32:57Z) - AniDress: Animatable Loose-Dressed Avatar from Sparse Views Using
Garment Rigging Model [58.035758145894846]
We introduce AniDress, a novel method for generating animatable human avatars in loose clothes using very sparse multi-view videos.
A pose-driven deformable neural radiance field conditioned on both body and garment motions is introduced, providing explicit control of both parts.
Our method is able to render natural garment dynamics that deviate highly from the body and well to generalize to both unseen views and poses.
arXiv Detail & Related papers (2024-01-27T08:48:18Z) - StableVITON: Learning Semantic Correspondence with Latent Diffusion
Model for Virtual Try-On [35.227896906556026]
Given a clothing image and a person image, an image-based virtual try-on aims to generate a customized image that appears natural and accurately reflects the characteristics of the clothing image.
In this work, we aim to expand the applicability of the pre-trained diffusion model so that it can be utilized independently for the virtual try-on task.
Our proposed zero cross-attention blocks not only preserve the clothing details by learning the semantic correspondence but also generate high-fidelity images by utilizing the inherent knowledge of the pre-trained model in the warping process.
arXiv Detail & Related papers (2023-12-04T08:27:59Z) - Towards Scalable Unpaired Virtual Try-On via Patch-Routed
Spatially-Adaptive GAN [66.3650689395967]
We propose a texture-preserving end-to-end network, the PAtch-routed SpaTially-Adaptive GAN (PASTA-GAN), that facilitates real-world unpaired virtual try-on.
To disentangle the style and spatial information of each garment, PASTA-GAN consists of an innovative patch-routed disentanglement module.
arXiv Detail & Related papers (2021-11-20T08:36:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.