Dressing in the Wild by Watching Dance Videos
- URL: http://arxiv.org/abs/2203.15320v1
- Date: Tue, 29 Mar 2022 08:05:45 GMT
- Title: Dressing in the Wild by Watching Dance Videos
- Authors: Xin Dong, Fuwei Zhao, Zhenyu Xie, Xijin Zhang, Daniel K. Du, Min
Zheng, Xiang Long, Xiaodan Liang, Jianchao Yang
- Abstract summary: This paper attends to virtual try-on in real-world scenes and brings improvements in authenticity and naturalness.
We propose a novel generative network called wFlow that can effectively push up garment transfer to in-the-wild context.
- Score: 69.7692630502019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While significant progress has been made in garment transfer, one of the most
applicable directions of human-centric image generation, existing works
overlook the in-the-wild imagery, presenting severe garment-person misalignment
as well as noticeable degradation in fine texture details. This paper,
therefore, attends to virtual try-on in real-world scenes and brings essential
improvements in authenticity and naturalness especially for loose garment
(e.g., skirts, formal dresses), challenging poses (e.g., cross arms, bent
legs), and cluttered backgrounds. Specifically, we find that the pixel flow
excels at handling loose garments whereas the vertex flow is preferred for hard
poses, and by combining their advantages we propose a novel generative network
called wFlow that can effectively push up garment transfer to in-the-wild
context. Moreover, former approaches require paired images for training.
Instead, we cut down the laboriousness by working on a newly constructed
large-scale video dataset named Dance50k with self-supervised cross-frame
training and an online cycle optimization. The proposed Dance50k can boost
real-world virtual dressing by covering a wide variety of garments under
dancing poses. Extensive experiments demonstrate the superiority of our wFlow
in generating realistic garment transfer results for in-the-wild images without
resorting to expensive paired datasets.
Related papers
- IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - GraVITON: Graph based garment warping with attention guided inversion for Virtual-tryon [5.790630195329777]
We introduce a novel graph based warping technique which emphasizes the value of context in garment flow.
Our method, validated on VITON-HD and Dresscode datasets, showcases substantial improvement in garment warping, texture preservation, and overall realism.
arXiv Detail & Related papers (2024-06-04T10:29:18Z) - AniDress: Animatable Loose-Dressed Avatar from Sparse Views Using
Garment Rigging Model [58.035758145894846]
We introduce AniDress, a novel method for generating animatable human avatars in loose clothes using very sparse multi-view videos.
A pose-driven deformable neural radiance field conditioned on both body and garment motions is introduced, providing explicit control of both parts.
Our method is able to render natural garment dynamics that deviate highly from the body and well to generalize to both unseen views and poses.
arXiv Detail & Related papers (2024-01-27T08:48:18Z) - StableVITON: Learning Semantic Correspondence with Latent Diffusion
Model for Virtual Try-On [35.227896906556026]
Given a clothing image and a person image, an image-based virtual try-on aims to generate a customized image that appears natural and accurately reflects the characteristics of the clothing image.
In this work, we aim to expand the applicability of the pre-trained diffusion model so that it can be utilized independently for the virtual try-on task.
Our proposed zero cross-attention blocks not only preserve the clothing details by learning the semantic correspondence but also generate high-fidelity images by utilizing the inherent knowledge of the pre-trained model in the warping process.
arXiv Detail & Related papers (2023-12-04T08:27:59Z) - Fill in Fabrics: Body-Aware Self-Supervised Inpainting for Image-Based
Virtual Try-On [3.5698678013121334]
We propose a self-supervised conditional generative adversarial network based framework comprised of a Fabricator and a Segmenter, Warper and Fuser.
The Fabricator reconstructs the clothing image when provided with a masked clothing as input, and learns the overall structure of the clothing by filling in fabrics.
A virtual try-on pipeline is then trained by transferring the learned representations from the Fabricator to Warper in an effort to warp and refine the target clothing.
arXiv Detail & Related papers (2022-10-03T13:25:31Z) - Dressing Avatars: Deep Photorealistic Appearance for Physically
Simulated Clothing [49.96406805006839]
We introduce pose-driven avatars with explicit modeling of clothing that exhibit both realistic clothing dynamics and photorealistic appearance learned from real-world data.
Our key contribution is a physically-inspired appearance network, capable of generating photorealistic appearance with view-dependent and dynamic shadowing effects even for unseen body-clothing configurations.
arXiv Detail & Related papers (2022-06-30T17:58:20Z) - Per Garment Capture and Synthesis for Real-time Virtual Try-on [15.128477359632262]
Existing image-based works try to synthesize a try-on image from a single image of a target garment.
It is difficult to reproduce the change of wrinkles caused by pose and body size change, as well as pulling and stretching of the garment by hand.
We propose an alternative per garment capture and synthesis workflow to handle such rich interactions by training the model with many systematically captured images.
arXiv Detail & Related papers (2021-09-10T03:49:37Z) - Style and Pose Control for Image Synthesis of Humans from a Single
Monocular View [78.6284090004218]
StylePoseGAN is a non-controllable generator to accept conditioning of pose and appearance separately.
Our network can be trained in a fully supervised way with human images to disentangle pose, appearance and body parts.
StylePoseGAN achieves state-of-the-art image generation fidelity on common perceptual metrics.
arXiv Detail & Related papers (2021-02-22T18:50:47Z) - Single-Shot Freestyle Dance Reenactment [89.91619150027265]
The task of motion transfer between a source dancer and a target person is a special case of the pose transfer problem.
We propose a novel method that can reanimate a single image by arbitrary video sequences, unseen during training.
arXiv Detail & Related papers (2020-12-02T12:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.