TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
- URL: http://arxiv.org/abs/2411.18350v2
- Date: Thu, 07 Aug 2025 19:11:41 GMT
- Title: TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
- Authors: Riza Velioglu, Petra Bevandic, Robin Chan, Barbara Hammer,
- Abstract summary: We introduce Virtual Try-Off (VTOFF), a novel task generating standardized garment images from single photos of clothed individuals.<n>TryOffDiff adapts Stable Diffusion with SigLIP-based visual conditioning to deliver high-fidelity reconstructions.<n>Our findings highlight VTOFF's potential to improve e-commerce product imagery, advance generative model evaluation, and guide future research on high-fidelity reconstruction.
- Score: 8.158200403139196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces Virtual Try-Off (VTOFF), a novel task generating standardized garment images from single photos of clothed individuals. Unlike Virtual Try-On (VTON), which digitally dresses models, VTOFF extracts canonical garment images, demanding precise reconstruction of shape, texture, and complex patterns, enabling robust evaluation of generative model fidelity. We propose TryOffDiff, adapting Stable Diffusion with SigLIP-based visual conditioning to deliver high-fidelity reconstructions. Experiments on VITON-HD and Dress Code datasets show that TryOffDiff outperforms adapted pose transfer and VTON baselines. We observe that traditional metrics such as SSIM inadequately reflect reconstruction quality, prompting our use of DISTS for reliable assessment. Our findings highlight VTOFF's potential to improve e-commerce product imagery, advance generative model evaluation, and guide future research on high-fidelity reconstruction. Demo, code, and models are available at: https://rizavelioglu.github.io/tryoffdiff
Related papers
- VTONQA: A Multi-Dimensional Quality Assessment Dataset for Virtual Try-on [83.39966045949338]
VTONQA is the first multi-dimensional quality assessment dataset specifically designed for VTON.<n>It contains 8,132 images generated by 11 representative VTON models, along with 24,396 mean opinion scores (MOSs) across three evaluation dimensions.<n>We benchmark both VTON models and a diverse set of image quality assessment (IQA) metrics, revealing the limitations of existing methods.
arXiv Detail & Related papers (2026-01-06T11:42:26Z) - Rethinking Garment Conditioning in Diffusion-based Virtual Try-On [7.386027762996787]
We develop Re-CatVTON, an efficient single UNet model that achieves high performance.<n>The proposed Re-CatVTON significantly improves performance compared to its predecessor.<n>Our results demonstrate improved FID, KID, and LPIPS scores, with only a marginal decrease in SSIM.
arXiv Detail & Related papers (2025-11-24T05:19:44Z) - UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction [73.29048162438797]
We introduce UniVerse, a unified framework for robust reconstruction based on a video diffusion model.<n>Specifically, UniVerse first converts inconsistent images into initial videos, then uses a specially designed video diffusion model to restore them into consistent images.<n>Experiments on both synthetic and real-world datasets demonstrate the strong generalization capability and superior performance of our method in robust reconstruction.
arXiv Detail & Related papers (2025-10-02T04:50:18Z) - Undress to Redress: A Training-Free Framework for Virtual Try-On [19.00614787972817]
We propose UR-VTON (Undress-Redress Virtual Try-ON), a training-free framework that can be seamlessly integrated with any existing VTON method.<n> UR-VTON introduces an ''undress-to-redress'' mechanism: it first reveals the user's torso by virtually ''undressing'', then applies the target short-sleeve garment.<n>We also present LS-TON, a new benchmark for long-sleeve-to-short-sleeve try-on.
arXiv Detail & Related papers (2025-08-11T06:55:49Z) - Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals [76.96387718150542]
We present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF)<n>Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting.<n> Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task.
arXiv Detail & Related papers (2025-05-27T11:47:51Z) - Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling [20.072689146353348]
We introduce a garment extraction model that generates (human, synthetic garment) pairs from a single image of a clothed individual.
We also propose an Error-Aware Refinement-based Schr"odinger Bridge (EARSB) that surgically targets localized generation errors.
In user studies, our model is preferred by the users in an average of 59% of cases.
arXiv Detail & Related papers (2025-01-08T18:25:50Z) - TryOffAnyone: Tiled Cloth Generation from a Dressed Person [1.4732811715354452]
High-fidelity tiled garment images are essential for personalized recommendations, outfit composition, and virtual try-on systems.
We propose a novel approach utilizing a fine-tuned StableDiffusion model.
Our method features a streamlined single-stage network design, which integrates garmentspecific masks to isolate and process target clothing items effectively.
arXiv Detail & Related papers (2024-12-11T17:41:53Z) - Improving Virtual Try-On with Garment-focused Diffusion Models [91.95830983115474]
Diffusion models have led to the revolutionizing of generative modeling in numerous image synthesis tasks.
We shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process.
Experiments on VITON-HD and DressCode datasets demonstrate the superiority of our GarDiff when compared to state-of-the-art VTON approaches.
arXiv Detail & Related papers (2024-09-12T17:55:11Z) - Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models [69.50286698375386]
We propose a novel approach that better harnesses diffusion models for face-swapping.
We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping.
Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
arXiv Detail & Related papers (2024-09-11T13:43:53Z) - AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario [50.62711489896909]
AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap.
AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.
arXiv Detail & Related papers (2024-05-28T13:33:08Z) - FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on [21.34959824429241]
FLDM-VTON is a novel Faithful Latent Diffusion Model for VTON.
It incorporates clothes as both the starting point and local condition, supplying the model with faithful clothes priors.
It is able to generate photo-realistic try-on images with faithful clothing details.
arXiv Detail & Related papers (2024-04-22T13:21:09Z) - Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On [29.217423805933727]
Diffusion model-based approaches have recently become popular, as they are excellent at image synthesis tasks.
We propose an Texture-Preserving Diffusion (TPD) model for virtual try-on, which enhances the fidelity of the results.
Second, we propose a novel diffusion-based method that predicts a precise inpainting mask based on the person and reference garment images.
arXiv Detail & Related papers (2024-04-01T12:43:22Z) - Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement [69.6035373784027]
Low-light image enhancement (LLIE) has achieved promising performance by employing conditional diffusion models.
Previous methods may neglect the importance of a sufficient formulation of task-specific condition strategy.
We propose JoReS-Diff, a novel approach that incorporates Retinex- and semantic-based priors as the additional pre-processing condition.
arXiv Detail & Related papers (2023-12-20T08:05:57Z) - CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model [38.08115084929579]
Generative Adversarial Networks (GANs) dominate the research field in image-based virtual try-on.
We propose Controllable Accelerated virtual Try-on with Diffusion Model (CAT-DM)
arXiv Detail & Related papers (2023-11-30T09:56:17Z) - Diffusion Models for Image Restoration and Enhancement -- A
Comprehensive Survey [96.99328714941657]
We present a comprehensive review of recent diffusion model-based methods on image restoration.
We classify and emphasize the innovative designs using diffusion models for both IR and blind/real-world IR.
We propose five potential and challenging directions for the future research of diffusion model-based IR.
arXiv Detail & Related papers (2023-08-18T08:40:38Z) - DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.
Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.