MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization
- URL: http://arxiv.org/abs/2508.08488v1
- Date: Mon, 11 Aug 2025 21:45:07 GMT
- Title: MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization
- Authors: Ankan Deria, Dwarikanath Mahapatra, Behzad Bozorgtabar, Mohna Chakraborty, Snehashis Chakraborty, Sudipta Roy,
- Abstract summary: We introduce MuGa-VTON, a unified multi-garment diffusion framework that jointly models upper and lower garments together with person identity in a shared latent space.<n>This architecture supports prompt-based customization, allowing fine-grained garment modifications with minimal user input.
- Score: 19.780800887427937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Virtual try-on seeks to generate photorealistic images of individuals in desired garments, a task that must simultaneously preserve personal identity and garment fidelity for practical use in fashion retail and personalization. However, existing methods typically handle upper and lower garments separately, rely on heavy preprocessing, and often fail to preserve person-specific cues such as tattoos, accessories, and body shape-resulting in limited realism and flexibility. To this end, we introduce MuGa-VTON, a unified multi-garment diffusion framework that jointly models upper and lower garments together with person identity in a shared latent space. Specifically, we proposed three key modules: the Garment Representation Module (GRM) for capturing both garment semantics, the Person Representation Module (PRM) for encoding identity and pose cues, and the A-DiT fusion module, which integrates garment, person, and text-prompt features through a diffusion transformer. This architecture supports prompt-based customization, allowing fine-grained garment modifications with minimal user input. Extensive experiments on the VITON-HD and DressCode benchmarks demonstrate that MuGa-VTON outperforms existing methods in both qualitative and quantitative evaluations, producing high-fidelity, identity-preserving results suitable for real-world virtual try-on applications.
Related papers
- OmniVTON++: Training-Free Universal Virtual Try-On with Principal Pose Guidance [85.23143742905695]
Image-based Virtual Try-On (VTON) concerns the synthesis of realistic person imagery through garment re-rendering under human pose and body constraints.<n>We present OmniVTON++, a training-free VTON framework designed for universal applicability.
arXiv Detail & Related papers (2026-02-16T08:27:43Z) - Neural Clothing Tryer: Customized Virtual Try-On via Semantic Enhancement and Controlling Diffusion Model [35.49427419001177]
This work aims to address a novel Customized Virtual Try-ON (Cu-VTON) task.<n>It enables the superimposition of a specified garment onto a model that can be customized in terms of appearance, posture, and additional attributes.
arXiv Detail & Related papers (2026-01-30T11:05:14Z) - Undress to Redress: A Training-Free Framework for Virtual Try-On [19.00614787972817]
We propose UR-VTON (Undress-Redress Virtual Try-ON), a training-free framework that can be seamlessly integrated with any existing VTON method.<n> UR-VTON introduces an ''undress-to-redress'' mechanism: it first reveals the user's torso by virtually ''undressing'', then applies the target short-sleeve garment.<n>We also present LS-TON, a new benchmark for long-sleeve-to-short-sleeve try-on.
arXiv Detail & Related papers (2025-08-11T06:55:49Z) - One Model For All: Partial Diffusion for Unified Try-On and Try-Off in Any Pose [99.056324701764]
We introduce textbfOMFA (emphOne Model For All), a unified diffusion framework for both virtual try-on and try-off.<n>The framework is entirely mask-free and requires only a single portrait and a target pose as input.<n>It achieves state-of-the-art results on both try-on and try-off tasks, providing a practical and generalizable solution for virtual garment synthesis.
arXiv Detail & Related papers (2025-08-06T15:46:01Z) - OmniVTON: Training-Free Universal Virtual Try-On [53.31945401098557]
Image-based Virtual Try-On (VTON) techniques rely on either supervised in-shop approaches, or unsupervised in-the-wild methods, which improve adaptability but remain constrained by data biases and limited universality.<n>We propose OmniVTON, the first training-free universal VTON framework that decouples garment and pose conditioning to achieve both texture fidelity and pose consistency across diverse settings.
arXiv Detail & Related papers (2025-07-20T16:37:53Z) - Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals [76.96387718150542]
We present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF)<n>Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting.<n> Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task.
arXiv Detail & Related papers (2025-05-27T11:47:51Z) - IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - Towards Scalable Unpaired Virtual Try-On via Patch-Routed
Spatially-Adaptive GAN [66.3650689395967]
We propose a texture-preserving end-to-end network, the PAtch-routed SpaTially-Adaptive GAN (PASTA-GAN), that facilitates real-world unpaired virtual try-on.
To disentangle the style and spatial information of each garment, PASTA-GAN consists of an innovative patch-routed disentanglement module.
arXiv Detail & Related papers (2021-11-20T08:36:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.