Self-Supervised Vision Transformer for Enhanced Virtual Clothes Try-On
- URL: http://arxiv.org/abs/2406.10539v1
- Date: Sat, 15 Jun 2024 07:46:22 GMT
- Title: Self-Supervised Vision Transformer for Enhanced Virtual Clothes Try-On
- Authors: Lingxiao Lu, Shengyi Wu, Haoxuan Sun, Junhong Gou, Jianlou Si, Chen Qian, Jianfu Zhang, Liqing Zhang,
- Abstract summary: We introduce an innovative approach for virtual clothes try-on, utilizing a self-supervised Vision Transformer (ViT) and a diffusion model.
Our method emphasizes detail enhancement by contrasting local clothing image embeddings, generated by ViT, with their global counterparts.
The experimental results showcase substantial advancements in the realism and precision of details in virtual try-on experiences.
- Score: 21.422611451978863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual clothes try-on has emerged as a vital feature in online shopping, offering consumers a critical tool to visualize how clothing fits. In our research, we introduce an innovative approach for virtual clothes try-on, utilizing a self-supervised Vision Transformer (ViT) coupled with a diffusion model. Our method emphasizes detail enhancement by contrasting local clothing image embeddings, generated by ViT, with their global counterparts. Techniques such as conditional guidance and focus on key regions have been integrated into our approach. These combined strategies empower the diffusion model to reproduce clothing details with increased clarity and realism. The experimental results showcase substantial advancements in the realism and precision of details in virtual try-on experiences, significantly surpassing the capabilities of existing technologies.
Related papers
- Improving Virtual Try-On with Garment-focused Diffusion Models [91.95830983115474]
Diffusion models have led to the revolutionizing of generative modeling in numerous image synthesis tasks.
We shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process.
Experiments on VITON-HD and DressCode datasets demonstrate the superiority of our GarDiff when compared to state-of-the-art VTON approaches.
arXiv Detail & Related papers (2024-09-12T17:55:11Z) - IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - GraVITON: Graph based garment warping with attention guided inversion for Virtual-tryon [5.790630195329777]
We introduce a novel graph based warping technique which emphasizes the value of context in garment flow.
Our method, validated on VITON-HD and Dresscode datasets, showcases substantial improvement in garment warping, texture preservation, and overall realism.
arXiv Detail & Related papers (2024-06-04T10:29:18Z) - AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario [50.62711489896909]
AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap.
AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.
arXiv Detail & Related papers (2024-05-28T13:33:08Z) - MV-VTON: Multi-View Virtual Try-On with Diffusion Models [91.71150387151042]
The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing.
Existing methods solely focus on the frontal try-on using the frontal clothing.
We introduce Multi-View Virtual Try-ON (MV-VTON), which aims to reconstruct the dressing results from multiple views using the given clothes.
arXiv Detail & Related papers (2024-04-26T12:27:57Z) - Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
Virtual Try-on [7.46772222515689]
OOTDiffusion is a novel network architecture for realistic and controllable image-based virtual try-on.
We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the garment detail features.
Our experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results.
arXiv Detail & Related papers (2024-03-04T07:17:44Z) - PFDM: Parser-Free Virtual Try-on via Diffusion Model [28.202996582963184]
We propose a free virtual try-on method based on the diffusion model (PFDM)
Given two images, PFDM can "wear" garments on the target person seamlessly by implicitly warping without any other information.
Experiments demonstrate that our proposed PFDM can successfully handle complex images, and outperform both state-of-the-art-free and high-fidelity-based models.
arXiv Detail & Related papers (2024-02-05T14:32:57Z) - LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On [35.4056826207203]
This work introduces LaDI-VTON, the first Latent Diffusion textual Inversion-enhanced model for the Virtual Try-ON task.
The proposed architecture relies on a latent diffusion model extended with a novel additional autoencoder module.
We show that our approach outperforms the competitors by a consistent margin, achieving a significant milestone for the task.
arXiv Detail & Related papers (2023-05-22T21:38:06Z) - C-VTON: Context-Driven Image-Based Virtual Try-On Network [1.0832844764942349]
We propose a Context-Driven Virtual Try-On Network (C-VTON) that convincingly transfers selected clothing items to the target subjects.
At the core of the C-VTON pipeline are: (i) a geometric matching procedure that efficiently aligns the target clothing with the pose of the person in the input images, and (ii) a powerful image generator that utilizes various types of contextual information when the final try-on result.
arXiv Detail & Related papers (2022-12-08T17:56:34Z) - Cloth Interactive Transformer for Virtual Try-On [106.21605249649957]
We propose a novel two-stage cloth interactive transformer (CIT) method for the virtual try-on task.
In the first stage, we design a CIT matching block, aiming to precisely capture the long-range correlations between the cloth-agnostic person information and the in-shop cloth information.
In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask.
arXiv Detail & Related papers (2021-04-12T14:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.