Related papers: GO-MLVTON: Garment Occlusion-Aware Multi-Layer Virtual Try-On with Diffusion Models

GO-MLVTON: Garment Occlusion-Aware Multi-Layer Virtual Try-On with Diffusion Models

URL: http://arxiv.org/abs/2601.13524v2
Date: Thu, 22 Jan 2026 08:26:07 GMT
Title: GO-MLVTON: Garment Occlusion-Aware Multi-Layer Virtual Try-On with Diffusion Models
Authors: Yang Yu, Yunze Deng, Yige Zhang, Yanjie Xiao, Youkun Ou, Wenhao Hu, Mingchao Li, Bin Feng, Wenyu Liu, Dandan Zheng, Jingdong Chen,
Abstract summary: Existing image-based virtual try-on (VTON) methods primarily focus on single-layer or multi-garment VTON.<n>We propose GO-MLVTON, the first multi-layer VTON method, introducing the Garment Occlusion Learning module and the StableDiffusion-based Garment Morphing & Fitting module.<n>We present the MLG dataset for this task and propose a new metric named Layered Appearance Coherence Difference (LACD) for evaluation.
Score: 37.32099831689131
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Existing image-based virtual try-on (VTON) methods primarily focus on single-layer or multi-garment VTON, neglecting multi-layer VTON (ML-VTON), which involves dressing multiple layers of garments onto the human body with realistic deformation and layering to generate visually plausible outcomes. The main challenge lies in accurately modeling occlusion relationships between inner and outer garments to reduce interference from redundant inner garment features. To address this, we propose GO-MLVTON, the first multi-layer VTON method, introducing the Garment Occlusion Learning module to learn occlusion relationships and the StableDiffusion-based Garment Morphing & Fitting module to deform and fit garments onto the human body, producing high-quality multi-layer try-on results. Additionally, we present the MLG dataset for this task and propose a new metric named Layered Appearance Coherence Difference (LACD) for evaluation. Extensive experiments demonstrate the state-of-the-art performance of GO-MLVTON. Project page: https://upyuyang.github.io/go-mlvton/.

Related papers

OmniVTON++: Training-Free Universal Virtual Try-On with Principal Pose Guidance [85.23143742905695]
Image-based Virtual Try-On (VTON) concerns the synthesis of realistic person imagery through garment re-rendering under human pose and body constraints.<n>We present OmniVTON++, a training-free VTON framework designed for universal applicability.
arXiv Detail & Related papers (2026-02-16T08:27:43Z)
MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization [19.780800887427937]
We introduce MuGa-VTON, a unified multi-garment diffusion framework that jointly models upper and lower garments together with person identity in a shared latent space.<n>This architecture supports prompt-based customization, allowing fine-grained garment modifications with minimal user input.
arXiv Detail & Related papers (2025-08-11T21:45:07Z)
Undress to Redress: A Training-Free Framework for Virtual Try-On [19.00614787972817]
We propose UR-VTON (Undress-Redress Virtual Try-ON), a training-free framework that can be seamlessly integrated with any existing VTON method.<n> UR-VTON introduces an ''undress-to-redress'' mechanism: it first reveals the user's torso by virtually ''undressing'', then applies the target short-sleeve garment.<n>We also present LS-TON, a new benchmark for long-sleeve-to-short-sleeve try-on.
arXiv Detail & Related papers (2025-08-11T06:55:49Z)
OmniVTON: Training-Free Universal Virtual Try-On [53.31945401098557]
Image-based Virtual Try-On (VTON) techniques rely on either supervised in-shop approaches, or unsupervised in-the-wild methods, which improve adaptability but remain constrained by data biases and limited universality.<n>We propose OmniVTON, the first training-free universal VTON framework that decouples garment and pose conditioning to achieve both texture fidelity and pose consistency across diverse settings.
arXiv Detail & Related papers (2025-07-20T16:37:53Z)
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals [76.96387718150542]
We present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF)<n>Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting.<n> Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task.
arXiv Detail & Related papers (2025-05-27T11:47:51Z)
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models [77.39903417768967]
CatVTON is a virtual try-on diffusion model that transfers in-shop or worn garments of arbitrary categories to target individuals.<n>CatVTON consists only of a VAE and a simplified denoising UNet, removing redundant image and text encoders.<n>Experiments demonstrate that CatVTON achieves superior qualitative and quantitative results compared to baseline methods.
arXiv Detail & Related papers (2024-07-21T11:58:53Z)
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on [7.46772222515689]
OOTDiffusion is a novel network architecture for realistic and controllable image-based virtual try-on. We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the garment detail features. Our experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results.
arXiv Detail & Related papers (2024-03-04T07:17:44Z)
Single Stage Multi-Pose Virtual Try-On [119.95115739956661]
Multi-pose virtual try-on (MPVTON) aims to fit a target garment onto a person at a target pose. MPVTON provides a better try-on experience, but is also more challenging due to the dual garment and pose editing objectives. Existing methods adopt a pipeline comprising three disjoint modules including a target semantic layout prediction module, a coarse try-on image generator and a refinement try-on image generator. In this paper, we propose a novel single stage model forTON. Key to our model is a parallel flow estimation module that predicts the flow fields for both person and garment images conditioned on
arXiv Detail & Related papers (2022-11-19T15:02:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.