DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On
- URL: http://arxiv.org/abs/2506.23295v1
- Date: Sun, 29 Jun 2025 15:31:42 GMT
- Title: DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On
- Authors: Xiang Xu,
- Abstract summary: Virtual try-on (VTON) aims to synthesize realistic images of a person wearing a target garment, with broad applications in e-commerce and digital fashion.<n>We propose DiffFit, a novel two-stage latent diffusion framework for high-fidelity virtual try-on.
- Score: 3.5655800569257896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual try-on (VTON) aims to synthesize realistic images of a person wearing a target garment, with broad applications in e-commerce and digital fashion. While recent advances in latent diffusion models have substantially improved visual quality, existing approaches still struggle with preserving fine-grained garment details, achieving precise garment-body alignment, maintaining inference efficiency, and generalizing to diverse poses and clothing styles. To address these challenges, we propose DiffFit, a novel two-stage latent diffusion framework for high-fidelity virtual try-on. DiffFit adopts a progressive generation strategy: the first stage performs geometry-aware garment warping, aligning the garment with the target body through fine-grained deformation and pose adaptation. The second stage refines texture fidelity via a cross-modal conditional diffusion model that integrates the warped garment, the original garment appearance, and the target person image for high-quality rendering. By decoupling geometric alignment and appearance refinement, DiffFit effectively reduces task complexity and enhances both generation stability and visual realism. It excels in preserving garment-specific attributes such as textures, wrinkles, and lighting, while ensuring accurate alignment with the human body. Extensive experiments on large-scale VTON benchmarks demonstrate that DiffFit achieves superior performance over existing state-of-the-art methods in both quantitative metrics and perceptual evaluations.
Related papers
- Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off [9.45991209383675]
We propose Voost - a unified framework that jointly learns virtual try-on and try-off with a single diffusion transformer.<n>Voost achieves state-of-the-art results on both try-on and try-off benchmarks, consistently outperforming strong baselines in alignment accuracy, visual fidelity, and generalization.
arXiv Detail & Related papers (2025-08-06T19:10:58Z) - OmniVTON: Training-Free Universal Virtual Try-On [53.31945401098557]
Image-based Virtual Try-On (VTON) techniques rely on either supervised in-shop approaches, or unsupervised in-the-wild methods, which improve adaptability but remain constrained by data biases and limited universality.<n>We propose OmniVTON, the first training-free universal VTON framework that decouples garment and pose conditioning to achieve both texture fidelity and pose consistency across diverse settings.
arXiv Detail & Related papers (2025-07-20T16:37:53Z) - DS-VTON: High-Quality Virtual Try-on via Disentangled Dual-Scale Generation [38.499761393356124]
DS-VTON is a dual-scale virtual try-on framework that disentangles objectives for more effective modeling.<n>Our method adopts a fully mask-free generation paradigm, eliminating reliance on human parsing maps or segmentation masks.
arXiv Detail & Related papers (2025-06-01T08:52:57Z) - HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment [11.00877062567135]
We propose HF-VTON, a novel framework that ensures high-fidelity virtual try-on performance across diverse poses.<n> HF-VTON consists of three key modules: the Appearance-Preserving Warp Alignment Module, the Semantic Representation Module, and the Multimodal Prior-Guided Appearance Generation Generation Module.<n> Experimental results demonstrate that HF-VTON outperforms state-of-the-art methods on both VITON-HD and SAMP-VTONS.
arXiv Detail & Related papers (2025-05-26T07:55:49Z) - FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on [73.13242624924814]
Garment perception enhancement technique, FitDiT, is designed for high-fidelity virtual try-on using Diffusion Transformers (DiT)
We introduce a garment texture extractor that incorporates garment priors evolution to fine-tune garment feature, facilitating to better capture rich details such as stripes, patterns, and text.
We also employ a dilated-relaxed mask strategy that adapts to the correct length of garments, preventing the generation of garments that fill the entire mask area during cross-category try-on.
arXiv Detail & Related papers (2024-11-15T11:02:23Z) - Improving Virtual Try-On with Garment-focused Diffusion Models [91.95830983115474]
Diffusion models have led to the revolutionizing of generative modeling in numerous image synthesis tasks.
We shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process.
Experiments on VITON-HD and DressCode datasets demonstrate the superiority of our GarDiff when compared to state-of-the-art VTON approaches.
arXiv Detail & Related papers (2024-09-12T17:55:11Z) - AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario [50.62711489896909]
AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap.
AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.
arXiv Detail & Related papers (2024-05-28T13:33:08Z) - Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual
Try-on [81.15988741258683]
Image-based Virtual Try-On (VITON) aims to transfer an in-shop garment image onto a target person.
Current methods often overlook the synthesis quality around the garment-skin boundary and realistic effects like wrinkles and shadows on the warped garments.
We propose WarpDiffusion, which bridges the warping-based and diffusion-based paradigms via a novel informative and local garment feature attention mechanism.
arXiv Detail & Related papers (2023-12-06T18:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.