Related papers: OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

URL: http://arxiv.org/abs/2403.01779v2
Date: Thu, 7 Mar 2024 06:35:35 GMT
Title: OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Authors: Yuhao Xu, Tao Gu, Weifeng Chen, and Chengcai Chen
Abstract summary: OOTDiffusion is a novel network architecture for realistic and controllable image-based virtual try-on. We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the garment detail features. Our experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results.
Score: 7.46772222515689
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present OOTDiffusion, a novel network architecture for realistic and controllable image-based virtual try-on (VTON). We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the garment detail features. Without a redundant warping process, the garment features are precisely aligned with the target human body via the proposed outfitting fusion in the self-attention layers of the denoising UNet. In order to further enhance the controllability, we introduce outfitting dropout to the training process, which enables us to adjust the strength of the garment features through classifier-free guidance. Our comprehensive experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results for arbitrary human and garment images, which outperforms other VTON methods in both realism and controllability, indicating an impressive breakthrough in virtual try-on. Our source code is available at https://github.com/levihsu/OOTDiffusion.

Related papers

OmniVTON: Training-Free Universal Virtual Try-On [53.31945401098557]
Image-based Virtual Try-On (VTON) techniques rely on either supervised in-shop approaches, or unsupervised in-the-wild methods, which improve adaptability but remain constrained by data biases and limited universality.<n>We propose OmniVTON, the first training-free universal VTON framework that decouples garment and pose conditioning to achieve both texture fidelity and pose consistency across diverse settings.
arXiv Detail & Related papers (2025-07-20T16:37:53Z)
Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On [89.9123806553489]
Diffusion models have shown success in virtual try-on (VTON) task.<n>The problem remains challenging to preserve the shape and every detail of the given garment due to the intrinsicity of diffusion model.<n>We propose to explicitly capitalize on visual correspondence as the prior to tame diffusion process.
arXiv Detail & Related papers (2025-05-22T17:52:13Z)
InstaRevive: One-Step Image Enhancement via Dynamic Score Matching [66.97989469865828]
InstaRevive is an image enhancement framework that employs score-based diffusion distillation to harness potent generative capability. Our framework delivers high-quality and visually appealing results across a diverse array of challenging tasks and datasets.
arXiv Detail & Related papers (2025-04-22T01:19:53Z)
ODPG: Outfitting Diffusion with Pose Guided Condition [2.5602836891933074]
VTON technology allows users to visualize how clothes would look on them without physically trying them on. Traditional VTON methods, often using Geneversarative Adrial Networks (GANs) and Diffusion models, face challenges in achieving high realism and handling dynamic poses. This paper introduces Outfitting Diffusion with Pose Guided Condition (ODPG), a novel approach that leverages a latent diffusion model with multiple conditioning inputs during the denoising process.
arXiv Detail & Related papers (2025-01-12T10:30:27Z)
DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On [103.89972383310715]
DiffusionTrend harnesses latent information rich in prior information to capture the nuances of garment details. It delivers a visually compelling try-on experience, underscoring the potential of training-free diffusion model.
arXiv Detail & Related papers (2024-12-19T02:24:35Z)
TryOffAnyone: Tiled Cloth Generation from a Dressed Person [1.4732811715354452]
High-fidelity tiled garment images are essential for personalized recommendations, outfit composition, and virtual try-on systems. We propose a novel approach utilizing a fine-tuned StableDiffusion model. Our method features a streamlined single-stage network design, which integrates garmentspecific masks to isolate and process target clothing items effectively.
arXiv Detail & Related papers (2024-12-11T17:41:53Z)
DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning [6.501730122478447]
DH-VTON is a deep text-driven virtual try-on model featuring a special hybrid attention learning strategy and deep garment semantic preservation module. To extract the deep semantics of the garments, we first introduce InternViT-6B as fine-grained feature learner, which can be trained to align with the large-scale intrinsic knowledge. To enhance the customized dressing abilities, we further introduce Garment-Feature ControlNet Plus (abbr. GFC+) module.
arXiv Detail & Related papers (2024-10-16T12:27:10Z)
Improving Virtual Try-On with Garment-focused Diffusion Models [91.95830983115474]
Diffusion models have led to the revolutionizing of generative modeling in numerous image synthesis tasks. We shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process. Experiments on VITON-HD and DressCode datasets demonstrate the superiority of our GarDiff when compared to state-of-the-art VTON approaches.
arXiv Detail & Related papers (2024-09-12T17:55:11Z)
IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions. IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z)
FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on [21.34959824429241]
FLDM-VTON is a novel Faithful Latent Diffusion Model for VTON. It incorporates clothes as both the starting point and local condition, supplying the model with faithful clothes priors. It is able to generate photo-realistic try-on images with faithful clothing details.
arXiv Detail & Related papers (2024-04-22T13:21:09Z)
Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment. We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z)
PFDM: Parser-Free Virtual Try-on via Diffusion Model [28.202996582963184]
We propose a free virtual try-on method based on the diffusion model (PFDM) Given two images, PFDM can "wear" garments on the target person seamlessly by implicitly warping without any other information. Experiments demonstrate that our proposed PFDM can successfully handle complex images, and outperform both state-of-the-art-free and high-fidelity-based models.
arXiv Detail & Related papers (2024-02-05T14:32:57Z)
WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on [81.15988741258683]
Image-based Virtual Try-On (VITON) aims to transfer an in-shop garment image onto a target person. Current methods often overlook the synthesis quality around the garment-skin boundary and realistic effects like wrinkles and shadows on the warped garments. We propose WarpDiffusion, which bridges the warping-based and diffusion-based paradigms via a novel informative and local garment feature attention mechanism.
arXiv Detail & Related papers (2023-12-06T18:34:32Z)
SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning. The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views. We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z)
Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow [24.187109053871833]
Virtual try-on is a critical image synthesis task that aims to transfer clothes from one image to another while preserving the details of both humans and clothes. We propose an exemplar-based inpainting approach that leverages a warping module to guide the diffusion model's generation effectively. Our approach, namely Diffusion-based Conditional Inpainting for Virtual Try-ON (DCI-VTON), effectively utilizes the power of the diffusion model.
arXiv Detail & Related papers (2023-08-11T12:23:09Z)
Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN [66.3650689395967]
We propose a texture-preserving end-to-end network, the PAtch-routed SpaTially-Adaptive GAN (PASTA-GAN), that facilitates real-world unpaired virtual try-on. To disentangle the style and spatial information of each garment, PASTA-GAN consists of an innovative patch-routed disentanglement module.
arXiv Detail & Related papers (2021-11-20T08:36:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.