Related papers: Undress to Redress: A Training-Free Framework for Virtual Try-On

Undress to Redress: A Training-Free Framework for Virtual Try-On

URL: http://arxiv.org/abs/2508.07680v1
Date: Mon, 11 Aug 2025 06:55:49 GMT
Title: Undress to Redress: A Training-Free Framework for Virtual Try-On
Authors: Zhiying Li, Junhao Wu, Yeying Jin, Daiheng Gao, Yun Ji, Kaichuan Kong, Lei Yu, Hao Xu, Kai Chen, Bruce Gu, Nana Wang, Zhaoxin Fan,
Abstract summary: We propose UR-VTON (Undress-Redress Virtual Try-ON), a training-free framework that can be seamlessly integrated with any existing VTON method.<n> UR-VTON introduces an ''undress-to-redress'' mechanism: it first reveals the user's torso by virtually ''undressing'', then applies the target short-sleeve garment.<n>We also present LS-TON, a new benchmark for long-sleeve-to-short-sleeve try-on.
Score: 19.00614787972817
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Virtual try-on (VTON) is a crucial task for enhancing user experience in online shopping by generating realistic garment previews on personal photos. Although existing methods have achieved impressive results, they struggle with long-sleeve-to-short-sleeve conversions-a common and practical scenario-often producing unrealistic outputs when exposed skin is underrepresented in the original image. We argue that this challenge arises from the ''majority'' completion rule in current VTON models, which leads to inaccurate skin restoration in such cases. To address this, we propose UR-VTON (Undress-Redress Virtual Try-ON), a novel, training-free framework that can be seamlessly integrated with any existing VTON method. UR-VTON introduces an ''undress-to-redress'' mechanism: it first reveals the user's torso by virtually ''undressing,'' then applies the target short-sleeve garment, effectively decomposing the conversion into two more manageable steps. Additionally, we incorporate Dynamic Classifier-Free Guidance scheduling to balance diversity and image quality during DDPM sampling, and employ Structural Refiner to enhance detail fidelity using high-frequency cues. Finally, we present LS-TON, a new benchmark for long-sleeve-to-short-sleeve try-on. Extensive experiments demonstrate that UR-VTON outperforms state-of-the-art methods in both detail preservation and image quality. Code will be released upon acceptance.

Related papers

OmniVTON++: Training-Free Universal Virtual Try-On with Principal Pose Guidance [85.23143742905695]
Image-based Virtual Try-On (VTON) concerns the synthesis of realistic person imagery through garment re-rendering under human pose and body constraints.<n>We present OmniVTON++, a training-free VTON framework designed for universal applicability.
arXiv Detail & Related papers (2026-02-16T08:27:43Z)
GO-MLVTON: Garment Occlusion-Aware Multi-Layer Virtual Try-On with Diffusion Models [37.32099831689131]
Existing image-based virtual try-on (VTON) methods primarily focus on single-layer or multi-garment VTON.<n>We propose GO-MLVTON, the first multi-layer VTON method, introducing the Garment Occlusion Learning module and the StableDiffusion-based Garment Morphing & Fitting module.<n>We present the MLG dataset for this task and propose a new metric named Layered Appearance Coherence Difference (LACD) for evaluation.
arXiv Detail & Related papers (2026-01-20T02:20:34Z)
MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization [19.780800887427937]
We introduce MuGa-VTON, a unified multi-garment diffusion framework that jointly models upper and lower garments together with person identity in a shared latent space.<n>This architecture supports prompt-based customization, allowing fine-grained garment modifications with minimal user input.
arXiv Detail & Related papers (2025-08-11T21:45:07Z)
OmniVTON: Training-Free Universal Virtual Try-On [53.31945401098557]
Image-based Virtual Try-On (VTON) techniques rely on either supervised in-shop approaches, or unsupervised in-the-wild methods, which improve adaptability but remain constrained by data biases and limited universality.<n>We propose OmniVTON, the first training-free universal VTON framework that decouples garment and pose conditioning to achieve both texture fidelity and pose consistency across diverse settings.
arXiv Detail & Related papers (2025-07-20T16:37:53Z)
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals [76.96387718150542]
We present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF)<n>Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting.<n> Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task.
arXiv Detail & Related papers (2025-05-27T11:47:51Z)
ITVTON: Virtual Try-On Diffusion Transformer Based on Integrated Image and Text [11.85544970521423]
We introduce ITVTON, which utilizes the Diffusion Transformer (DiT) as a generator to enhance image quality.<n>ITVTON improves garment-person interaction by stitching garment and person images along the spatial channel.<n>We constrain training to attention parameters within a single Diffusion Transformer (Single-DiT) block.
arXiv Detail & Related papers (2025-01-28T07:24:15Z)
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models [8.158200403139196]
We introduce Virtual Try-Off (VTOFF), a novel task generating standardized garment images from single photos of clothed individuals.<n>TryOffDiff adapts Stable Diffusion with SigLIP-based visual conditioning to deliver high-fidelity reconstructions.<n>Our findings highlight VTOFF's potential to improve e-commerce product imagery, advance generative model evaluation, and guide future research on high-fidelity reconstruction.
arXiv Detail & Related papers (2024-11-27T13:53:09Z)
High-Fidelity Virtual Try-on with Large-Scale Unpaired Learning [36.7085107012134]
Virtual try-on (VTON) transfers a target clothing image to a reference person, where clothing fidelity is a key requirement for downstream e-commerce applications. We propose a novel framework textbfBoosted Virtual Try-on (BVTON) to leverage the large-scale unpaired learning for high-fidelity try-on.
arXiv Detail & Related papers (2024-11-03T15:00:26Z)
Improving Virtual Try-On with Garment-focused Diffusion Models [91.95830983115474]
Diffusion models have led to the revolutionizing of generative modeling in numerous image synthesis tasks. We shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process. Experiments on VITON-HD and DressCode datasets demonstrate the superiority of our GarDiff when compared to state-of-the-art VTON approaches.
arXiv Detail & Related papers (2024-09-12T17:55:11Z)
IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions. IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z)
AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario [50.62711489896909]
AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap. AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.
arXiv Detail & Related papers (2024-05-28T13:33:08Z)
Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment. We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z)
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on [7.46772222515689]
OOTDiffusion is a novel network architecture for realistic and controllable image-based virtual try-on. We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the garment detail features. Our experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results.
arXiv Detail & Related papers (2024-03-04T07:17:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.