OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
Virtual Try-on
- URL: http://arxiv.org/abs/2403.01779v2
- Date: Thu, 7 Mar 2024 06:35:35 GMT
- Title: OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
Virtual Try-on
- Authors: Yuhao Xu, Tao Gu, Weifeng Chen, and Chengcai Chen
- Abstract summary: OOTDiffusion is a novel network architecture for realistic and controllable image-based virtual try-on.
We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the garment detail features.
Our experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results.
- Score: 7.46772222515689
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present OOTDiffusion, a novel network architecture for realistic and
controllable image-based virtual try-on (VTON). We leverage the power of
pretrained latent diffusion models, designing an outfitting UNet to learn the
garment detail features. Without a redundant warping process, the garment
features are precisely aligned with the target human body via the proposed
outfitting fusion in the self-attention layers of the denoising UNet. In order
to further enhance the controllability, we introduce outfitting dropout to the
training process, which enables us to adjust the strength of the garment
features through classifier-free guidance. Our comprehensive experiments on the
VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently
generates high-quality try-on results for arbitrary human and garment images,
which outperforms other VTON methods in both realism and controllability,
indicating an impressive breakthrough in virtual try-on. Our source code is
available at https://github.com/levihsu/OOTDiffusion.
Related papers
- ODPG: Outfitting Diffusion with Pose Guided Condition [2.5602836891933074]
VTON technology allows users to visualize how clothes would look on them without physically trying them on.
Traditional VTON methods, often using Geneversarative Adrial Networks (GANs) and Diffusion models, face challenges in achieving high realism and handling dynamic poses.
This paper introduces Outfitting Diffusion with Pose Guided Condition (ODPG), a novel approach that leverages a latent diffusion model with multiple conditioning inputs during the denoising process.
arXiv Detail & Related papers (2025-01-12T10:30:27Z) - DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On [103.89972383310715]
DiffusionTrend harnesses latent information rich in prior information to capture the nuances of garment details.
It delivers a visually compelling try-on experience, underscoring the potential of training-free diffusion model.
arXiv Detail & Related papers (2024-12-19T02:24:35Z) - TryOffAnyone: Tiled Cloth Generation from a Dressed Person [1.4732811715354452]
High-fidelity tiled garment images are essential for personalized recommendations, outfit composition, and virtual try-on systems.
We propose a novel approach utilizing a fine-tuned StableDiffusion model.
Our method features a streamlined single-stage network design, which integrates garmentspecific masks to isolate and process target clothing items effectively.
arXiv Detail & Related papers (2024-12-11T17:41:53Z) - DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning [6.501730122478447]
DH-VTON is a deep text-driven virtual try-on model featuring a special hybrid attention learning strategy and deep garment semantic preservation module.
To extract the deep semantics of the garments, we first introduce InternViT-6B as fine-grained feature learner, which can be trained to align with the large-scale intrinsic knowledge.
To enhance the customized dressing abilities, we further introduce Garment-Feature ControlNet Plus (abbr. GFC+) module.
arXiv Detail & Related papers (2024-10-16T12:27:10Z) - Improving Virtual Try-On with Garment-focused Diffusion Models [91.95830983115474]
Diffusion models have led to the revolutionizing of generative modeling in numerous image synthesis tasks.
We shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process.
Experiments on VITON-HD and DressCode datasets demonstrate the superiority of our GarDiff when compared to state-of-the-art VTON approaches.
arXiv Detail & Related papers (2024-09-12T17:55:11Z) - IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on [21.34959824429241]
FLDM-VTON is a novel Faithful Latent Diffusion Model for VTON.
It incorporates clothes as both the starting point and local condition, supplying the model with faithful clothes priors.
It is able to generate photo-realistic try-on images with faithful clothing details.
arXiv Detail & Related papers (2024-04-22T13:21:09Z) - Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual
Try-on [81.15988741258683]
Image-based Virtual Try-On (VITON) aims to transfer an in-shop garment image onto a target person.
Current methods often overlook the synthesis quality around the garment-skin boundary and realistic effects like wrinkles and shadows on the warped garments.
We propose WarpDiffusion, which bridges the warping-based and diffusion-based paradigms via a novel informative and local garment feature attention mechanism.
arXiv Detail & Related papers (2023-12-06T18:34:32Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.