Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild
- URL: http://arxiv.org/abs/2406.15331v1
- Date: Fri, 21 Jun 2024 17:45:37 GMT
- Title: Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild
- Authors: Nadav Orzech, Yotam Nitzan, Ulysse Mizrahi, Dov Danon, Amit H. Bermano,
- Abstract summary: Virtual Try-On aims to replace a piece of garment in an image with one from another, while preserving person and garment characteristics as well as image fidelity.
Current literature takes a supervised approach for the task, impairing generalization and imposing heavy computation.
We present a novel zero-shot training-free method for inpainting a clothing garment by reference.
- Score: 17.025262797698364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual Try-On (VTON) is a highly active line of research, with increasing demand. It aims to replace a piece of garment in an image with one from another, while preserving person and garment characteristics as well as image fidelity. Current literature takes a supervised approach for the task, impairing generalization and imposing heavy computation. In this paper, we present a novel zero-shot training-free method for inpainting a clothing garment by reference. Our approach employs the prior of a diffusion model with no additional training, fully leveraging its native generalization capabilities. The method employs extended attention to transfer image information from reference to target images, overcoming two significant challenges. We first initially warp the reference garment over the target human using deep features, alleviating "texture sticking". We then leverage the extended attention mechanism with careful masking, eliminating leakage of reference background and unwanted influence. Through a user study, qualitative, and quantitative comparison to state-of-the-art approaches, we demonstrate superior image quality and garment preservation compared unseen clothing pieces or human figures.
Related papers
- Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On [29.217423805933727]
Diffusion model-based approaches have recently become popular, as they are excellent at image synthesis tasks.
We propose an Texture-Preserving Diffusion (TPD) model for virtual try-on, which enhances the fidelity of the results.
Second, we propose a novel diffusion-based method that predicts a precise inpainting mask based on the person and reference garment images.
arXiv Detail & Related papers (2024-04-01T12:43:22Z) - Extracting Human Attention through Crowdsourced Patch Labeling [18.947126675569667]
In image classification, a significant problem arises from bias in the datasets.
One approach to mitigate such biases is to direct the model's attention toward the target object's location.
We propose a novel patch-labeling method that integrates AI assistance with crowdsourcing to capture human attention from images.
arXiv Detail & Related papers (2024-03-22T07:57:27Z) - Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - StableVITON: Learning Semantic Correspondence with Latent Diffusion
Model for Virtual Try-On [35.227896906556026]
Given a clothing image and a person image, an image-based virtual try-on aims to generate a customized image that appears natural and accurately reflects the characteristics of the clothing image.
In this work, we aim to expand the applicability of the pre-trained diffusion model so that it can be utilized independently for the virtual try-on task.
Our proposed zero cross-attention blocks not only preserve the clothing details by learning the semantic correspondence but also generate high-fidelity images by utilizing the inherent knowledge of the pre-trained model in the warping process.
arXiv Detail & Related papers (2023-12-04T08:27:59Z) - Style-Based Global Appearance Flow for Virtual Try-On [119.95115739956661]
A novel global appearance flow estimation model is proposed in this work.
Experiment results on a popular virtual try-on benchmark show that our method achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-04-03T10:58:04Z) - Dressing in the Wild by Watching Dance Videos [69.7692630502019]
This paper attends to virtual try-on in real-world scenes and brings improvements in authenticity and naturalness.
We propose a novel generative network called wFlow that can effectively push up garment transfer to in-the-wild context.
arXiv Detail & Related papers (2022-03-29T08:05:45Z) - Progressive and Aligned Pose Attention Transfer for Person Image
Generation [59.87492938953545]
This paper proposes a new generative adversarial network for pose transfer, i.e., transferring the pose of a given person to a target pose.
We use two types of blocks, namely Pose-Attentional Transfer Block (PATB) and Aligned Pose-Attentional Transfer Bloc (APATB)
We verify the efficacy of the model on the Market-1501 and DeepFashion datasets, using quantitative and qualitative measures.
arXiv Detail & Related papers (2021-03-22T07:24:57Z) - PoNA: Pose-guided Non-local Attention for Human Pose Transfer [105.14398322129024]
We propose a new human pose transfer method using a generative adversarial network (GAN) with simplified cascaded blocks.
Our model generates sharper and more realistic images with rich details, while having fewer parameters and faster speed.
arXiv Detail & Related papers (2020-12-13T12:38:29Z) - Towards Photo-Realistic Virtual Try-On by Adaptively
Generating$\leftrightarrow$Preserving Image Content [85.24260811659094]
We propose a novel visual try-on network, namely Adaptive Content Generating and Preserving Network (ACGPN)
ACGPN first predicts semantic layout of the reference image that will be changed after try-on.
Second, a clothes warping module warps clothing images according to the generated semantic layout.
Third, an inpainting module for content fusion integrates all information (e.g. reference image, semantic layout, warped clothes) to adaptively produce each semantic part of human body.
arXiv Detail & Related papers (2020-03-12T15:55:39Z) - GarmentGAN: Photo-realistic Adversarial Fashion Transfer [0.0]
GarmentGAN performs image-based garment transfer through generative adversarial methods.
The framework allows users to virtually try-on items before purchase and generalizes to various apparel types.
arXiv Detail & Related papers (2020-03-04T05:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.