High-Fidelity Virtual Try-on with Large-Scale Unpaired Learning
- URL: http://arxiv.org/abs/2411.01593v1
- Date: Sun, 03 Nov 2024 15:00:26 GMT
- Title: High-Fidelity Virtual Try-on with Large-Scale Unpaired Learning
- Authors: Han Yang, Yanlong Zang, Ziwei Liu,
- Abstract summary: Virtual try-on (VTON) transfers a target clothing image to a reference person, where clothing fidelity is a key requirement for downstream e-commerce applications.
We propose a novel framework textbfBoosted Virtual Try-on (BVTON) to leverage the large-scale unpaired learning for high-fidelity try-on.
- Score: 36.7085107012134
- License:
- Abstract: Virtual try-on (VTON) transfers a target clothing image to a reference person, where clothing fidelity is a key requirement for downstream e-commerce applications. However, existing VTON methods still fall short in high-fidelity try-on due to the conflict between the high diversity of dressing styles (\eg clothes occluded by pants or distorted by posture) and the limited paired data for training. In this work, we propose a novel framework \textbf{Boosted Virtual Try-on (BVTON)} to leverage the large-scale unpaired learning for high-fidelity try-on. Our key insight is that pseudo try-on pairs can be reliably constructed from vastly available fashion images. Specifically, \textbf{1)} we first propose a compositional canonicalizing flow that maps on-model clothes into pseudo in-shop clothes, dubbed canonical proxy. Each clothing part (sleeves, torso) is reversely deformed into an in-shop-like shape to compositionally construct the canonical proxy. \textbf{2)} Next, we design a layered mask generation module that generates accurate semantic layout by training on canonical proxy. We replace the in-shop clothes used in conventional pipelines with the derived canonical proxy to boost the training process. \textbf{3)} Finally, we propose an unpaired try-on synthesizer by constructing pseudo training pairs with randomly misaligned on-model clothes, where intricate skin texture and clothes boundaries can be generated. Extensive experiments on high-resolution ($1024\times768$) datasets demonstrate the superiority of our approach over state-of-the-art methods both qualitatively and quantitatively. Notably, BVTON shows great generalizability and scalability to various dressing styles and data sources.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.