Related papers: BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

URL: http://arxiv.org/abs/2408.06047v2
Date: Fri, 22 Nov 2024 10:45:11 GMT
Title: BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training
Authors: Xuanpu Zhang, Dan Song, Pengxin Zhan, Tianyu Chang, Jianhao Zeng, Qingguo Chen, Weihua Luo, Anan Liu,
Abstract summary: Recent methods model virtual try-on as image mask-inpaint task, which requires masking the person image. Our research found that a mask-free approach can fully leverage spatial and lighting information from the original person image. We introduce BooW-VTON, the mask-free virtual try-on diffusion model, which delivers SOTA try-on quality without parsing cost.
Score: 32.77901123889236
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image-based virtual try-on is an increasingly popular and important task to generate realistic try-on images of the specific person. Recent methods model virtual try-on as image mask-inpaint task, which requires masking the person image and results in significant loss of spatial information. Especially, for in-the-wild try-on scenarios with complex poses and occlusions, mask-based methods often introduce noticeable artifacts. Our research found that a mask-free approach can fully leverage spatial and lighting information from the original person image, enabling high-quality virtual try-on. Consequently, we propose a novel training paradigm for a mask-free try-on diffusion model. We ensure the model's mask-free try-on capability by creating high-quality pseudo-data and further enhance its handling of complex spatial information through effective in-the-wild data augmentation. Besides, a try-on localization loss is designed to concentrate on try-on area while suppressing garment features in non-try-on areas, ensuring precise rendering of garments and preservation of fore/back-ground. In the end, we introduce BooW-VTON, the mask-free virtual try-on diffusion model, which delivers SOTA try-on quality without parsing cost. Extensive qualitative and quantitative experiments have demonstrated superior performance in wild scenarios with such a low-demand input.

Related papers

MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input [69.33864837012202]
We propose a Mask-Free VITON framework that achieves realistic VITON using only a single person image and a target garment. We leverage existing Mask-based VITON models to synthesize a high-quality dataset. This dataset contains diverse, realistic pairs of person images and corresponding garments, augmented with varied backgrounds to mimic real-world scenarios.
arXiv Detail & Related papers (2025-03-11T17:40:59Z)
PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control [4.984382582612786]
PainterNet is a plugin that can be flexibly embedded into various diffusion models. We propose local prompt input, Attention Control Points (ACP), and Actual-Token Attention Loss (ATAL) to enhance the model's focus on local areas. Our extensive experimental analysis exhibits that PainterNet surpasses existing state-of-the-art models in key metrics including image quality and global/local text consistency.
arXiv Detail & Related papers (2024-12-02T07:40:47Z)
Try-On-Adapter: A Simple and Flexible Try-On Paradigm [42.2724473500475]
Image-based virtual try-on, widely used in online shopping, aims to generate images of a naturally dressed person conditioned on certain garments. Previous methods focus on masking certain parts of the original model's standing image, and then inpainting on masked areas to generate realistic images of the model wearing corresponding reference garments. We propose Try-On-Adapter (TOA), an outpainting paradigm that differs from the existing inpainting paradigm.
arXiv Detail & Related papers (2024-11-15T13:35:58Z)
Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models [69.50286698375386]
We propose a novel approach that better harnesses diffusion models for face-swapping. We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping. Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
arXiv Detail & Related papers (2024-09-11T13:43:53Z)
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On [29.217423805933727]
Diffusion model-based approaches have recently become popular, as they are excellent at image synthesis tasks. We propose an Texture-Preserving Diffusion (TPD) model for virtual try-on, which enhances the fidelity of the results. Second, we propose a novel diffusion-based method that predicts a precise inpainting mask based on the person and reference garment images.
arXiv Detail & Related papers (2024-04-01T12:43:22Z)
Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment. We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z)
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition. Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images. We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z)
Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar [62.87222308616711]
We propose fullname (name), a method that adopts the neural point representation and the neural volume rendering process. Specifically, the neural points are strategically constrained around the surface of the target expression via a high-resolution UV displacement map. By design, our name is better equipped to handle topologically changing regions and thin structures while also ensuring accurate expression control when animating avatars.
arXiv Detail & Related papers (2023-07-11T03:40:10Z)
Controllable Inversion of Black-Box Face Recognition Models via Diffusion [8.620807177029892]
We tackle the task of inverting the latent space of pre-trained face recognition models without full model access. We show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution. Our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process.
arXiv Detail & Related papers (2023-03-23T03:02:09Z)
Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data. We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process. In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z)
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency [120.9499803967496]
We propose a novel informative-preserved reconstruction, which explores local statistics to discover and preserve the representative structured points. Our method can concentrate on modeling regional geometry and enjoy less ambiguity for masked reconstruction. By combining informative-preserved reconstruction on masked areas and consistency self-distillation from unmasked areas, a unified framework called MM-3DScene is yielded.
arXiv Detail & Related papers (2022-12-20T01:53:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.