OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
Virtual Try-on
- URL: http://arxiv.org/abs/2403.01779v2
- Date: Thu, 7 Mar 2024 06:35:35 GMT
- Title: OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
Virtual Try-on
- Authors: Yuhao Xu, Tao Gu, Weifeng Chen, and Chengcai Chen
- Abstract summary: OOTDiffusion is a novel network architecture for realistic and controllable image-based virtual try-on.
We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the garment detail features.
Our experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results.
- Score: 7.46772222515689
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present OOTDiffusion, a novel network architecture for realistic and
controllable image-based virtual try-on (VTON). We leverage the power of
pretrained latent diffusion models, designing an outfitting UNet to learn the
garment detail features. Without a redundant warping process, the garment
features are precisely aligned with the target human body via the proposed
outfitting fusion in the self-attention layers of the denoising UNet. In order
to further enhance the controllability, we introduce outfitting dropout to the
training process, which enables us to adjust the strength of the garment
features through classifier-free guidance. Our comprehensive experiments on the
VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently
generates high-quality try-on results for arbitrary human and garment images,
which outperforms other VTON methods in both realism and controllability,
indicating an impressive breakthrough in virtual try-on. Our source code is
available at https://github.com/levihsu/OOTDiffusion.
Related papers
- DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning [6.501730122478447]
DH-VTON is a deep text-driven virtual try-on model featuring a special hybrid attention learning strategy and deep garment semantic preservation module.
To extract the deep semantics of the garments, we first introduce InternViT-6B as fine-grained feature learner, which can be trained to align with the large-scale intrinsic knowledge.
To enhance the customized dressing abilities, we further introduce Garment-Feature ControlNet Plus (abbr. GFC+) module.
arXiv Detail & Related papers (2024-10-16T12:27:10Z) - Improving Virtual Try-On with Garment-focused Diffusion Models [91.95830983115474]
Diffusion models have led to the revolutionizing of generative modeling in numerous image synthesis tasks.
We shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process.
Experiments on VITON-HD and DressCode datasets demonstrate the superiority of our GarDiff when compared to state-of-the-art VTON approaches.
arXiv Detail & Related papers (2024-09-12T17:55:11Z) - IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on [21.34959824429241]
FLDM-VTON is a novel Faithful Latent Diffusion Model for VTON.
It incorporates clothes as both the starting point and local condition, supplying the model with faithful clothes priors.
It is able to generate photo-realistic try-on images with faithful clothing details.
arXiv Detail & Related papers (2024-04-22T13:21:09Z) - Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - PFDM: Parser-Free Virtual Try-on via Diffusion Model [28.202996582963184]
We propose a free virtual try-on method based on the diffusion model (PFDM)
Given two images, PFDM can "wear" garments on the target person seamlessly by implicitly warping without any other information.
Experiments demonstrate that our proposed PFDM can successfully handle complex images, and outperform both state-of-the-art-free and high-fidelity-based models.
arXiv Detail & Related papers (2024-02-05T14:32:57Z) - WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual
Try-on [81.15988741258683]
Image-based Virtual Try-On (VITON) aims to transfer an in-shop garment image onto a target person.
Current methods often overlook the synthesis quality around the garment-skin boundary and realistic effects like wrinkles and shadows on the warped garments.
We propose WarpDiffusion, which bridges the warping-based and diffusion-based paradigms via a novel informative and local garment feature attention mechanism.
arXiv Detail & Related papers (2023-12-06T18:34:32Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z) - Taming the Power of Diffusion Models for High-Quality Virtual Try-On
with Appearance Flow [24.187109053871833]
Virtual try-on is a critical image synthesis task that aims to transfer clothes from one image to another while preserving the details of both humans and clothes.
We propose an exemplar-based inpainting approach that leverages a warping module to guide the diffusion model's generation effectively.
Our approach, namely Diffusion-based Conditional Inpainting for Virtual Try-ON (DCI-VTON), effectively utilizes the power of the diffusion model.
arXiv Detail & Related papers (2023-08-11T12:23:09Z) - Towards Scalable Unpaired Virtual Try-On via Patch-Routed
Spatially-Adaptive GAN [66.3650689395967]
We propose a texture-preserving end-to-end network, the PAtch-routed SpaTially-Adaptive GAN (PASTA-GAN), that facilitates real-world unpaired virtual try-on.
To disentangle the style and spatial information of each garment, PASTA-GAN consists of an innovative patch-routed disentanglement module.
arXiv Detail & Related papers (2021-11-20T08:36:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.