Related papers: EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

URL: http://arxiv.org/abs/2511.00956v1
Date: Sun, 02 Nov 2025 14:32:31 GMT
Title: EVTAR: End-to-End Try on with Additional Unpaired Visual Reference
Authors: Liuzhuozheng Li, Yue Gong, Shanyuan Liu, Bo Cheng, Yuhang Ma, Liebucha Wu, Dengyang Jiang, Zanyi Wang, Dawei Leng, Yuhui Yin,
Abstract summary: We propose EVTAR, an End-to-End Virtual Try-on model with Additional Reference.<n>Our model generates try-on results without masks, densepose, or segmentation maps.<n>We enrich the training data with supplementary references and unpaired person images to support these capabilities.
Score: 16.702488896886845
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose EVTAR, an End-to-End Virtual Try-on model with Additional Reference, that directly fits the target garment onto the person image while incorporating reference images to enhance try-on accuracy. Most existing virtual try-on approaches rely on complex inputs such as agnostic person images, human pose, densepose, or body keypoints, making them labor-intensive and impractical for real-world applications. In contrast, EVTAR adopts a two-stage training strategy, enabling simple inference with only the source image and the target garment inputs. Our model generates try-on results without masks, densepose, or segmentation maps. Moreover, EVTAR leverages additional reference images of different individuals wearing the same clothes to preserve garment texture and fine-grained details better. This mechanism is analogous to how humans consider reference models when choosing outfits, thereby simulating a more realistic and high-quality dressing effect. We enrich the training data with supplementary references and unpaired person images to support these capabilities. We evaluate EVTAR on two widely used benchmarks and diverse tasks, and the results consistently validate the effectiveness of our approach.

Related papers

One Model For All: Partial Diffusion for Unified Try-On and Try-Off in Any Pose [99.056324701764]
We introduce textbfOMFA (emphOne Model For All), a unified diffusion framework for both virtual try-on and try-off.<n>The framework is entirely mask-free and requires only a single portrait and a target pose as input.<n>It achieves state-of-the-art results on both try-on and try-off tasks, providing a practical and generalizable solution for virtual garment synthesis.
arXiv Detail & Related papers (2025-08-06T15:46:01Z)
Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling [20.072689146353348]
We introduce a garment extraction model that generates (human, synthetic garment) pairs from a single image of a clothed individual.<n>We also propose an Error-Aware Refinement-based Schr"odinger Bridge (EARSB) that surgically targets localized generation errors.<n>In user studies, our model is preferred by the users in an average of 59% of cases.
arXiv Detail & Related papers (2025-01-08T18:25:50Z)
Try-On-Adapter: A Simple and Flexible Try-On Paradigm [42.2724473500475]
Image-based virtual try-on, widely used in online shopping, aims to generate images of a naturally dressed person conditioned on certain garments. Previous methods focus on masking certain parts of the original model's standing image, and then inpainting on masked areas to generate realistic images of the model wearing corresponding reference garments. We propose Try-On-Adapter (TOA), an outpainting paradigm that differs from the existing inpainting paradigm.
arXiv Detail & Related papers (2024-11-15T13:35:58Z)
IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions. IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z)
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On [29.217423805933727]
Diffusion model-based approaches have recently become popular, as they are excellent at image synthesis tasks. We propose an Texture-Preserving Diffusion (TPD) model for virtual try-on, which enhances the fidelity of the results. Second, we propose a novel diffusion-based method that predicts a precise inpainting mask based on the person and reference garment images.
arXiv Detail & Related papers (2024-04-01T12:43:22Z)
Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment. We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z)
StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On [35.227896906556026]
Given a clothing image and a person image, an image-based virtual try-on aims to generate a customized image that appears natural and accurately reflects the characteristics of the clothing image. In this work, we aim to expand the applicability of the pre-trained diffusion model so that it can be utilized independently for the virtual try-on task. Our proposed zero cross-attention blocks not only preserve the clothing details by learning the semantic correspondence but also generate high-fidelity images by utilizing the inherent knowledge of the pre-trained model in the warping process.
arXiv Detail & Related papers (2023-12-04T08:27:59Z)
Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects. In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL) A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z)
Learning Garment DensePose for Robust Warping in Virtual Try-On [72.13052519560462]
We propose a robust warping method for virtual try-on based on a learned garment DensePose. Our method achieves the state-of-the-art equivalent on virtual try-on benchmarks.
arXiv Detail & Related papers (2023-03-30T20:02:29Z)
Apparel-invariant Feature Learning for Apparel-changed Person Re-identification [70.16040194572406]
Most public ReID datasets are collected in a short time window in which persons' appearance rarely changes. In real-world applications such as in a shopping mall, the same person's clothing may change, and different persons may wearing similar clothes. It is critical to learn an apparel-invariant person representation under cases like cloth changing or several persons wearing similar clothes.
arXiv Detail & Related papers (2020-08-14T03:49:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.