Related papers: Try-On-Adapter: A Simple and Flexible Try-On Paradigm

Try-On-Adapter: A Simple and Flexible Try-On Paradigm

URL: http://arxiv.org/abs/2411.10187v1
Date: Fri, 15 Nov 2024 13:35:58 GMT
Title: Try-On-Adapter: A Simple and Flexible Try-On Paradigm
Authors: Hanzhong Guo, Jianfeng Zhang, Cheng Zou, Jun Li, Meng Wang, Ruxue Wen, Pingzhong Tang, Jingdong Chen, Ming Yang,
Abstract summary: Image-based virtual try-on, widely used in online shopping, aims to generate images of a naturally dressed person conditioned on certain garments. Previous methods focus on masking certain parts of the original model's standing image, and then inpainting on masked areas to generate realistic images of the model wearing corresponding reference garments. We propose Try-On-Adapter (TOA), an outpainting paradigm that differs from the existing inpainting paradigm.
Score: 42.2724473500475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image-based virtual try-on, widely used in online shopping, aims to generate images of a naturally dressed person conditioned on certain garments, providing significant research and commercial potential. A key challenge of try-on is to generate realistic images of the model wearing the garments while preserving the details of the garments. Previous methods focus on masking certain parts of the original model's standing image, and then inpainting on masked areas to generate realistic images of the model wearing corresponding reference garments, which treat the try-on task as an inpainting task. However, such implements require the user to provide a complete, high-quality standing image, which is user-unfriendly in practical applications. In this paper, we propose Try-On-Adapter (TOA), an outpainting paradigm that differs from the existing inpainting paradigm. Our TOA can preserve the given face and garment, naturally imagine the rest parts of the image, and provide flexible control ability with various conditions, e.g., garment properties and human pose. In the experiments, TOA shows excellent performance on the virtual try-on task even given relatively low-quality face and garment images in qualitative comparisons. Additionally, TOA achieves the state-of-the-art performance of FID scores 5.56 and 7.23 for paired and unpaired on the VITON-HD dataset in quantitative comparisons.

Related papers

DressWild: Feed-Forward Pose-Agnostic Garment Sewing Pattern Generation from In-the-Wild Images [50.11081091174558]
This paper focuses on sewing pattern generation for garment modeling and fabrication applications.<n>We propose DressWild, a novel feed-forward pipeline that reconstructs physics-consistent 2D sewing patterns and the corresponding 3D garments from a single in-the-wild image.
arXiv Detail & Related papers (2026-02-18T14:45:15Z)
One Model For All: Partial Diffusion for Unified Try-On and Try-Off in Any Pose [99.056324701764]
We introduce textbfOMFA (emphOne Model For All), a unified diffusion framework for both virtual try-on and try-off.<n>The framework is entirely mask-free and requires only a single portrait and a target pose as input.<n>It achieves state-of-the-art results on both try-on and try-off tasks, providing a practical and generalizable solution for virtual garment synthesis.
arXiv Detail & Related papers (2025-08-06T15:46:01Z)
VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration [5.465426769865638]
This paper proposes a detail retention virtual try-on method via accurate non-rigid registration (VITON-DRR) for diverse human poses.<n> Specifically, we reconstruct a human semantic segmentation using a dual-pyramid-structured feature extractor.<n>Then, a novel Deformation Module is designed for extracting the cloth key points and warping them through an accurate non-rigid registration algorithm.
arXiv Detail & Related papers (2025-05-29T13:38:21Z)
MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input [69.33864837012202]
We propose a Mask-Free VITON framework that achieves realistic VITON using only a single person image and a target garment. We leverage existing Mask-based VITON models to synthesize a high-quality dataset. This dataset contains diverse, realistic pairs of person images and corresponding garments, augmented with varied backgrounds to mimic real-world scenarios.
arXiv Detail & Related papers (2025-03-11T17:40:59Z)
Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks [31.461116368933165]
Image-based virtual try-on (VTON) aims to generate a virtual try-on result by transferring an input garment onto a target person's image. The scarcity of paired garment-model data makes it challenging for existing methods to achieve high generalization and quality in VTON. We propose Any2AnyTryon, which can generate try-on results based on different textual instructions and model garment images.
arXiv Detail & Related papers (2025-01-27T09:33:23Z)
BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training [32.77901123889236]
Recent methods model virtual try-on as image mask-inpaint task, which requires masking the person image. Our research found that a mask-free approach can fully leverage spatial and lighting information from the original person image. We introduce BooW-VTON, the mask-free virtual try-on diffusion model, which delivers SOTA try-on quality without parsing cost.
arXiv Detail & Related papers (2024-08-12T10:39:59Z)
IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions. IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z)
MV-VTON: Multi-View Virtual Try-On with Diffusion Models [91.71150387151042]
The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing. Existing methods solely focus on the frontal try-on using the frontal clothing. We introduce Multi-View Virtual Try-ON (MV-VTON), which aims to reconstruct the dressing results from multiple views using the given clothes.
arXiv Detail & Related papers (2024-04-26T12:27:57Z)
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On [29.217423805933727]
Diffusion model-based approaches have recently become popular, as they are excellent at image synthesis tasks. We propose an Texture-Preserving Diffusion (TPD) model for virtual try-on, which enhances the fidelity of the results. Second, we propose a novel diffusion-based method that predicts a precise inpainting mask based on the person and reference garment images.
arXiv Detail & Related papers (2024-04-01T12:43:22Z)
Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment. We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z)
StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On [35.227896906556026]
Given a clothing image and a person image, an image-based virtual try-on aims to generate a customized image that appears natural and accurately reflects the characteristics of the clothing image. In this work, we aim to expand the applicability of the pre-trained diffusion model so that it can be utilized independently for the virtual try-on task. Our proposed zero cross-attention blocks not only preserve the clothing details by learning the semantic correspondence but also generate high-fidelity images by utilizing the inherent knowledge of the pre-trained model in the warping process.
arXiv Detail & Related papers (2023-12-04T08:27:59Z)
Images Speak in Images: A Generalist Painter for In-Context Visual Learning [98.78475432114595]
In-context learning allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. It is unclear how to define the general-purpose task prompts that the vision model can understand and transfer to out-of-domain tasks. We present Painter, a generalist model which redefines the output of core vision tasks as images, and specify task prompts as also images.
arXiv Detail & Related papers (2022-12-05T18:59:50Z)
Weakly Supervised High-Fidelity Clothing Model Generation [67.32235668920192]
We propose a cheap yet scalable weakly-supervised method called Deep Generative Projection (DGP) to address this specific scenario. We show that projecting the rough alignment of clothing and body onto the StyleGAN space can yield photo-realistic wearing results.
arXiv Detail & Related papers (2021-12-14T07:15:15Z)
Data Augmentation using Random Image Cropping for High-resolution Virtual Try-On (VITON-CROP) [18.347532903864597]
VITON-CROP synthesizes images more robustly when integrated with random crop augmentation compared to the existing state-of-the-art virtual try-on models. In the experiments, we demonstrate that VITON-CROP is superior to VITON-HD both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-11-16T07:40:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.