Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
- URL: http://arxiv.org/abs/2505.21062v1
- Date: Tue, 27 May 2025 11:47:51 GMT
- Title: Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
- Authors: Davide Lobba, Fulvio Sanguigni, Bin Ren, Marcella Cornia, Rita Cucchiara, Nicu Sebe,
- Abstract summary: We present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF)<n>Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting.<n> Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task.
- Score: 76.96387718150542
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While virtual try-on (VTON) systems aim to render a garment onto a target person image, this paper tackles the novel task of virtual try-off (VTOFF), which addresses the inverse problem: generating standardized product images of garments from real-world photos of clothed individuals. Unlike VTON, which must resolve diverse pose and style variations, VTOFF benefits from a consistent and well-defined output format -- typically a flat, lay-down-style representation of the garment -- making it a promising tool for data generation and dataset enhancement. However, existing VTOFF approaches face two major limitations: (i) difficulty in disentangling garment features from occlusions and complex poses, often leading to visual artifacts, and (ii) restricted applicability to single-category garments (e.g., upper-body clothes only), limiting generalization. To address these challenges, we present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF), a novel architecture featuring a dual DiT-based backbone with a modified multimodal attention mechanism for robust garment feature extraction. Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting. Finally, we propose an additional alignment module to further refine the generated visual details. Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task, significantly improving both visual quality and fidelity to the target garments.
Related papers
- OmniVTON: Training-Free Universal Virtual Try-On [53.31945401098557]
Image-based Virtual Try-On (VTON) techniques rely on either supervised in-shop approaches, or unsupervised in-the-wild methods, which improve adaptability but remain constrained by data biases and limited universality.<n>We propose OmniVTON, the first training-free universal VTON framework that decouples garment and pose conditioning to achieve both texture fidelity and pose consistency across diverse settings.
arXiv Detail & Related papers (2025-07-20T16:37:53Z) - MGT: Extending Virtual Try-Off to Multi-Garment Scenarios [8.158200403139196]
We introduce Multi-Garment TryOffDiff (MGT), a diffusion-based VTOFF model capable of handling diverse garment types.<n>MGT incorporates class-specific embeddings, achieving state-of-the-art VTOFF results on VITON-HD and competitive performance on DressCode.
arXiv Detail & Related papers (2025-04-17T16:45:18Z) - TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models [8.158200403139196]
This paper introduces Virtual Try-Off (VTOFF), a novel task focused on generating standardized garment images from single photos of clothed individuals.<n>We present TryOffDiff, a model that adapts Stable Diffusion with SigLIP-based visual conditioning to ensure high fidelity and detail retention.<n>Our results highlight the potential of VTOFF to enhance product imagery in e-commerce applications, advance generative model evaluation, and inspire future work on high-fidelity reconstruction.
arXiv Detail & Related papers (2024-11-27T13:53:09Z) - FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on [73.13242624924814]
Garment perception enhancement technique, FitDiT, is designed for high-fidelity virtual try-on using Diffusion Transformers (DiT)
We introduce a garment texture extractor that incorporates garment priors evolution to fine-tune garment feature, facilitating to better capture rich details such as stripes, patterns, and text.
We also employ a dilated-relaxed mask strategy that adapts to the correct length of garments, preventing the generation of garments that fill the entire mask area during cross-category try-on.
arXiv Detail & Related papers (2024-11-15T11:02:23Z) - IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario [50.62711489896909]
AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap.
AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.
arXiv Detail & Related papers (2024-05-28T13:33:08Z) - MV-VTON: Multi-View Virtual Try-On with Diffusion Models [91.71150387151042]
The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing.<n>Existing methods solely focus on the frontal try-on using the frontal clothing.<n>We introduce Multi-View Virtual Try-ON (MV-VTON), which aims to reconstruct the dressing results from multiple views using the given clothes.
arXiv Detail & Related papers (2024-04-26T12:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.