OmniTry: Virtual Try-On Anything without Masks
- URL: http://arxiv.org/abs/2508.13632v1
- Date: Tue, 19 Aug 2025 08:47:31 GMT
- Title: OmniTry: Virtual Try-On Anything without Masks
- Authors: Yutong Feng, Linlin Zhang, Hengyuan Cao, Yiming Chen, Xiaoduan Feng, Jian Cao, Yuxiong Wu, Bin Wang,
- Abstract summary: This paper presents OmniTry, a unified framework that extends Virtual Try-ON (VTON) beyond garment to encompass any wearable objects.<n>Data curation is challenging for obtaining paired images, i.e., the object image and the corresponding try-on result.
- Score: 13.981452272679785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Virtual Try-ON (VTON) is a practical and widely-applied task, for which most of existing works focus on clothes. This paper presents OmniTry, a unified framework that extends VTON beyond garment to encompass any wearable objects, e.g., jewelries and accessories, with mask-free setting for more practical application. When extending to various types of objects, data curation is challenging for obtaining paired images, i.e., the object image and the corresponding try-on result. To tackle this problem, we propose a two-staged pipeline: For the first stage, we leverage large-scale unpaired images, i.e., portraits with any wearable items, to train the model for mask-free localization. Specifically, we repurpose the inpainting model to automatically draw objects in suitable positions given an empty mask. For the second stage, the model is further fine-tuned with paired images to transfer the consistency of object appearance. We observed that the model after the first stage shows quick convergence even with few paired samples. OmniTry is evaluated on a comprehensive benchmark consisting of 12 common classes of wearable objects, with both in-shop and in-the-wild images. Experimental results suggest that OmniTry shows better performance on both object localization and ID-preservation compared with existing methods. The code, model weights, and evaluation benchmark of OmniTry will be made publicly available at https://omnitry.github.io/.
Related papers
- EVTAR: End-to-End Try on with Additional Unpaired Visual Reference [16.702488896886845]
We propose EVTAR, an End-to-End Virtual Try-on model with Additional Reference.<n>Our model generates try-on results without masks, densepose, or segmentation maps.<n>We enrich the training data with supplementary references and unpaired person images to support these capabilities.
arXiv Detail & Related papers (2025-11-02T14:32:31Z) - One Model For All: Partial Diffusion for Unified Try-On and Try-Off in Any Pose [99.056324701764]
We introduce textbfOMFA (emphOne Model For All), a unified diffusion framework for both virtual try-on and try-off.<n>The framework is entirely mask-free and requires only a single portrait and a target pose as input.<n>It achieves state-of-the-art results on both try-on and try-off tasks, providing a practical and generalizable solution for virtual garment synthesis.
arXiv Detail & Related papers (2025-08-06T15:46:01Z) - OmniVTON: Training-Free Universal Virtual Try-On [53.31945401098557]
Image-based Virtual Try-On (VTON) techniques rely on either supervised in-shop approaches, or unsupervised in-the-wild methods, which improve adaptability but remain constrained by data biases and limited universality.<n>We propose OmniVTON, the first training-free universal VTON framework that decouples garment and pose conditioning to achieve both texture fidelity and pose consistency across diverse settings.
arXiv Detail & Related papers (2025-07-20T16:37:53Z) - MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input [69.33864837012202]
We propose a Mask-Free VITON framework that achieves realistic VITON using only a single person image and a target garment.<n>We leverage existing Mask-based VITON models to synthesize a high-quality dataset.<n>This dataset contains diverse, realistic pairs of person images and corresponding garments, augmented with varied backgrounds to mimic real-world scenarios.
arXiv Detail & Related papers (2025-03-11T17:40:59Z) - Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks [31.461116368933165]
Image-based virtual try-on (VTON) aims to generate a virtual try-on result by transferring an input garment onto a target person's image.<n>The scarcity of paired garment-model data makes it challenging for existing methods to achieve high generalization and quality in VTON.<n>We propose Any2AnyTryon, which can generate try-on results based on different textual instructions and model garment images.
arXiv Detail & Related papers (2025-01-27T09:33:23Z) - Try-On-Adapter: A Simple and Flexible Try-On Paradigm [42.2724473500475]
Image-based virtual try-on, widely used in online shopping, aims to generate images of a naturally dressed person conditioned on certain garments.
Previous methods focus on masking certain parts of the original model's standing image, and then inpainting on masked areas to generate realistic images of the model wearing corresponding reference garments.
We propose Try-On-Adapter (TOA), an outpainting paradigm that differs from the existing inpainting paradigm.
arXiv Detail & Related papers (2024-11-15T13:35:58Z) - Generic Objects as Pose Probes for Few-shot View Synthesis [14.768563613747633]
Radiance fields including NeRFs and 3D Gaussians demonstrate great potential in high-fidelity rendering and scene reconstruction.<n> COLMAP is frequently employed for preprocessing to estimate poses.<n>We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images.
arXiv Detail & Related papers (2024-08-29T16:37:58Z) - MV-VTON: Multi-View Virtual Try-On with Diffusion Models [91.71150387151042]
The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing.<n>Existing methods solely focus on the frontal try-on using the frontal clothing.<n>We introduce Multi-View Virtual Try-ON (MV-VTON), which aims to reconstruct the dressing results from multiple views using the given clothes.
arXiv Detail & Related papers (2024-04-26T12:27:57Z) - Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - AnyDoor: Zero-shot Object-level Image Customization [63.44307304097742]
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations.
Our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage.
arXiv Detail & Related papers (2023-07-18T17:59:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.