Related papers: Clothing agnostic Pre-inpainting Virtual Try-ON

Clothing agnostic Pre-inpainting Virtual Try-ON

URL: http://arxiv.org/abs/2509.17654v2
Date: Tue, 23 Sep 2025 06:05:25 GMT
Title: Clothing agnostic Pre-inpainting Virtual Try-ON
Authors: Sehyun Kim, Hye Jun Lee, Jiwoo Lee, Taemin Lee,
Abstract summary: CaP-VTON has improved the naturalness and consistency of whole-body clothing syn-thesis.<n> CaP-VTON recorded 92.5%, which is 15.4% better than Leffa in short-sleeved synthesis accuracy.
Score: 1.7764619921640012
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the development of deep learning technology, virtual try-on technology has become an important application value in the fields of e-commerce, fashion, and entertainment. The recently proposed Leffa has improved the texture distortion problem of diffu-sion-based models, but there are limitations in that the bottom detection inaccuracy and the existing clothing silhouette remain in the synthesis results. To solve this problem, this study proposes CaP-VTON (Clothing agnostic Pre-inpainting Virtual Try-ON). CaP-VTON has improved the naturalness and consistency of whole-body clothing syn-thesis by integrating multi-category masking based on Dress Code and skin inpainting based on Stable Diffusion. In particular, a generate skin module was introduced to solve the skin restoration problem that occurs when long-sleeved images are converted into short-sleeved or sleeveless ones, and high-quality restoration was implemented consider-ing the human body posture and color. As a result, CaP-VTON recorded 92.5%, which is 15.4% better than Leffa in short-sleeved synthesis accuracy, and showed the performance of consistently reproducing the style and shape of reference clothing in visual evaluation. These structures maintain model-agnostic properties and are applicable to various diffu-sion-based virtual inspection systems, and can contribute to applications that require high-precision virtual wearing, such as e-commerce, custom styling, and avatar creation.

Related papers

Undress to Redress: A Training-Free Framework for Virtual Try-On [19.00614787972817]
We propose UR-VTON (Undress-Redress Virtual Try-ON), a training-free framework that can be seamlessly integrated with any existing VTON method.<n> UR-VTON introduces an ''undress-to-redress'' mechanism: it first reveals the user's torso by virtually ''undressing'', then applies the target short-sleeve garment.<n>We also present LS-TON, a new benchmark for long-sleeve-to-short-sleeve try-on.
arXiv Detail & Related papers (2025-08-11T06:55:49Z)
DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On [3.5655800569257896]
Virtual try-on (VTON) aims to synthesize realistic images of a person wearing a target garment, with broad applications in e-commerce and digital fashion.<n>We propose DiffFit, a novel two-stage latent diffusion framework for high-fidelity virtual try-on.
arXiv Detail & Related papers (2025-06-29T15:31:42Z)
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals [76.96387718150542]
We present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF)<n>Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting.<n> Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task.
arXiv Detail & Related papers (2025-05-27T11:47:51Z)
Progressive Limb-Aware Virtual Try-On [14.334222729238608]
Existing image-based virtual try-on methods directly transfer specific clothing to a human image.<n>We present a progressive virtual try-on framework, named PL-VTON, which performs pixel-level clothing warping.<n>We also propose a Limb-aware Texture Fusion module to estimate high-quality details in limb regions.
arXiv Detail & Related papers (2025-03-16T17:41:02Z)
IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions. IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z)
AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario [50.62711489896909]
AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap. AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.
arXiv Detail & Related papers (2024-05-28T13:33:08Z)
Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models [4.038493506169702]
This study emphasizes the challenges of preserving intricate texture details and distinctive features of the target person and the clothes in various scenarios. Various existing approaches are explored, highlighting the limitations and unresolved aspects. It then proposes a novel diffusion-based solution that addresses garment texture preservation and user identity retention during virtual try-on.
arXiv Detail & Related papers (2024-03-12T07:15:29Z)
WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on [81.15988741258683]
Image-based Virtual Try-On (VITON) aims to transfer an in-shop garment image onto a target person. Current methods often overlook the synthesis quality around the garment-skin boundary and realistic effects like wrinkles and shadows on the warped garments. We propose WarpDiffusion, which bridges the warping-based and diffusion-based paradigms via a novel informative and local garment feature attention mechanism.
arXiv Detail & Related papers (2023-12-06T18:34:32Z)
Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN [66.3650689395967]
We propose a texture-preserving end-to-end network, the PAtch-routed SpaTially-Adaptive GAN (PASTA-GAN), that facilitates real-world unpaired virtual try-on. To disentangle the style and spatial information of each garment, PASTA-GAN consists of an innovative patch-routed disentanglement module.
arXiv Detail & Related papers (2021-11-20T08:36:12Z)
Cloth Interactive Transformer for Virtual Try-On [106.21605249649957]
We propose a novel two-stage cloth interactive transformer (CIT) method for the virtual try-on task. In the first stage, we design a CIT matching block, aiming to precisely capture the long-range correlations between the cloth-agnostic person information and the in-shop cloth information. In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask.
arXiv Detail & Related papers (2021-04-12T14:45:32Z)
VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization [18.347532903864597]
We propose a novel virtual try-on method called VITON-HD that successfully synthesizes 1024x768 virtual try-on images. We show that VITON-HD highly sur-passes the baselines in terms of synthesized image quality both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-03-31T07:52:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.