Related papers: MV-VTON: Multi-View Virtual Try-On with Diffusion Models

MV-VTON: Multi-View Virtual Try-On with Diffusion Models

URL: http://arxiv.org/abs/2404.17364v4
Date: Sun, 05 Jan 2025 14:31:28 GMT
Title: MV-VTON: Multi-View Virtual Try-On with Diffusion Models
Authors: Haoyu Wang, Zhilu Zhang, Donglin Di, Shiliang Zhang, Wangmeng Zuo,
Abstract summary: The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing.<n>Existing methods solely focus on the frontal try-on using the frontal clothing.<n>We introduce Multi-View Virtual Try-ON (MV-VTON), which aims to reconstruct the dressing results from multiple views using the given clothes.
Score: 91.71150387151042
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing. However, existing methods solely focus on the frontal try-on using the frontal clothing. When the views of the clothing and person are significantly inconsistent, particularly when the person's view is non-frontal, the results are unsatisfactory. To address this challenge, we introduce Multi-View Virtual Try-ON (MV-VTON), which aims to reconstruct the dressing results from multiple views using the given clothes. Given that single-view clothes provide insufficient information for MV-VTON, we instead employ two images, i.e., the frontal and back views of the clothing, to encompass the complete view as much as possible. Moreover, we adopt diffusion models that have demonstrated superior abilities to perform our MV-VTON. In particular, we propose a view-adaptive selection method where hard-selection and soft-selection are applied to the global and local clothing feature extraction, respectively. This ensures that the clothing features are roughly fit to the person's view. Subsequently, we suggest joint attention blocks to align and fuse clothing features with person features. Additionally, we collect a MV-VTON dataset MVG, in which each person has multiple photos with diverse views and poses. Experiments show that the proposed method not only achieves state-of-the-art results on MV-VTON task using our MVG dataset, but also has superiority on frontal-view virtual try-on task using VITON-HD and DressCode datasets.

Related papers

Enhancing Person-to-Person Virtual Try-On with Multi-Garment Virtual Try-Off [8.158200403139196]
Computer vision is transforming fashion through Virtual Try-On and Virtual Try-Off. VTON generates images of a person in a specified garment using a target photo and a standardized garment image. VTOFF, on the other hand, extracts standardized garment images from clothed individuals. We introduce TryOffDiff, a diffusion-based VTOFF model.
arXiv Detail & Related papers (2025-04-17T16:45:18Z)
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction [103.0918705283309]
Virtual Try-On (VTON) is a transformative technology in e-commerce and fashion design, enabling realistic digital visualization of clothing on individuals. We propose VTON 360, a novel 3D VTON method that addresses the open challenge of achieving high-fidelity VTON that supports any-view rendering.
arXiv Detail & Related papers (2025-03-15T15:08:48Z)
MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer [5.844515709826269]
Garment-to-person virtual try-on (VTON) aims to generate fitting images of a person wearing a reference garment. To improve ease of use, we propose a Mask-Free framework for Person-to-Person VTON. Our model excels in both person-to-person and garment-to-person VTON tasks, generating high-fidelity fitting images.
arXiv Detail & Related papers (2025-02-03T18:56:24Z)
Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks [31.461116368933165]
Image-based virtual try-on (VTON) aims to generate a virtual try-on result by transferring an input garment onto a target person's image. The scarcity of paired garment-model data makes it challenging for existing methods to achieve high generalization and quality in VTON. We propose Any2AnyTryon, which can generate try-on results based on different textual instructions and model garment images.
arXiv Detail & Related papers (2025-01-27T09:33:23Z)
IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions. IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z)
WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions. We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion. Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z)
M&M VTO: Multi-Garment Virtual Try-On and Editing [31.45715245587691]
M&M VTO is a mix and match virtual try-on method that takes as input multiple garment images, text description for garment layout and an image of a person. An example input includes: an image of a shirt, an image of a pair of pants, "rolled sleeves, shirt tucked in", and an image of a person. The output is a visualization of how those garments (in the desired layout) would look like on the given person.
arXiv Detail & Related papers (2024-06-06T22:46:37Z)
Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment. We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z)
WG-VITON: Wearing-Guide Virtual Try-On for Top and Bottom Clothes [1.9290392443571387]
We introduce Wearing-Guide VITON (i.e., WG-VITON) which utilizes an additional input binary mask to control the wearing styles of the generated image. Our experiments show that WG-VITON effectively generates an image of the model wearing given top and bottom clothes, and create complicated wearing styles such as partly tucking in the top to the bottom.
arXiv Detail & Related papers (2022-05-10T09:09:02Z)
Dress Code: High-Resolution Multi-Category Virtual Try-On [30.166151802234555]
Dress Code is more than 3x larger than publicly available datasets for image-based virtual try-on. We leverage a semantic-aware discriminator that makes predictions at pixel-level instead of image- or patch-level.
arXiv Detail & Related papers (2022-04-18T19:31:49Z)
Arbitrary Virtual Try-On Network: Characteristics Preservation and Trade-off between Body and Clothing [85.74977256940855]
We propose an Arbitrary Virtual Try-On Network (AVTON) for all-type clothes. AVTON can synthesize realistic try-on images by preserving and trading off characteristics of the target clothes and the reference person. Our approach can achieve better performance compared with the state-of-the-art virtual try-on methods.
arXiv Detail & Related papers (2021-11-24T08:59:56Z)
MV-TON: Memory-based Video Virtual Try-on network [49.496817042974456]
We propose a Memory-based Video virtual Try-On Network (MV-TON) MV-TON seamlessly transfers desired clothes to a target person without using any clothing templates and generates high-resolution realistic videos. Experimental results show the effectiveness of our method in the video virtual try-on task and its superiority over other existing methods.
arXiv Detail & Related papers (2021-08-17T08:35:23Z)
SPG-VTON: Semantic Prediction Guidance for Multi-pose Virtual Try-on [27.870740623131816]
Image-based virtual try-on is challenging in fitting a target in-shop clothes into a reference person under diverse human poses. We propose an end-to-end Semantic Prediction Guidance multi-pose Virtual Try-On Network (SPG-VTON) We evaluate the proposed method on the most massive multi-pose dataset (MPV) and the DeepFashion dataset.
arXiv Detail & Related papers (2021-08-03T15:40:50Z)
Shape Controllable Virtual Try-on for Underwear Models [0.0]
We propose a Shape Controllable Virtual Try-On Network (SC-VTON) to dress clothing for underwear models. SC-VTON integrates information of model and clothing to generate warped clothing image. Our method can generate high-resolution results with detailed textures.
arXiv Detail & Related papers (2021-07-28T04:01:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.