ClothFormer:Taming Video Virtual Try-on in All Module
- URL: http://arxiv.org/abs/2204.12151v1
- Date: Tue, 26 Apr 2022 08:40:28 GMT
- Title: ClothFormer:Taming Video Virtual Try-on in All Module
- Authors: Jianbin Jiang, Tan Wang, He Yan, Junhui Liu
- Abstract summary: Video virtual try-on aims to fit the target clothes to a person in the video with-temporal consistent results.
ClothFormer framework successfully synthesizes realistic, temporal consistent results in complicated environment.
- Score: 12.084652803378598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of video virtual try-on aims to fit the target clothes to a person
in the video with spatio-temporal consistency. Despite tremendous progress of
image virtual try-on, they lead to inconsistency between frames when applied to
videos. Limited work also explored the task of video-based virtual try-on but
failed to produce visually pleasing and temporally coherent results. Moreover,
there are two other key challenges: 1) how to generate accurate warping when
occlusions appear in the clothing region; 2) how to generate clothes and
non-target body parts (e.g. arms, neck) in harmony with the complicated
background; To address them, we propose a novel video virtual try-on framework,
ClothFormer, which successfully synthesizes realistic, harmonious, and
spatio-temporal consistent results in complicated environment. In particular,
ClothFormer involves three major modules. First, a two-stage anti-occlusion
warping module that predicts an accurate dense flow mapping between the body
regions and the clothing regions. Second, an appearance-flow tracking module
utilizes ridge regression and optical flow correction to smooth the dense flow
sequence and generate a temporally smooth warped clothing sequence. Third, a
dual-stream transformer extracts and fuses clothing textures, person features,
and environment information to generate realistic try-on videos. Through
rigorous experiments, we demonstrate that our method highly surpasses the
baselines in terms of synthesized video quality both qualitatively and
quantitatively.
Related papers
- WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos.
Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions.
We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion.
Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z) - ViViD: Video Virtual Try-on using Diffusion Models [46.710863047471264]
Video virtual try-on aims to transfer a clothing item onto the video of a target person.
Previous video-based try-on solutions can only generate low visual quality and blurring results.
We present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on.
arXiv Detail & Related papers (2024-05-20T05:28:22Z) - AniDress: Animatable Loose-Dressed Avatar from Sparse Views Using
Garment Rigging Model [58.035758145894846]
We introduce AniDress, a novel method for generating animatable human avatars in loose clothes using very sparse multi-view videos.
A pose-driven deformable neural radiance field conditioned on both body and garment motions is introduced, providing explicit control of both parts.
Our method is able to render natural garment dynamics that deviate highly from the body and well to generalize to both unseen views and poses.
arXiv Detail & Related papers (2024-01-27T08:48:18Z) - PERGAMO: Personalized 3D Garments from Monocular Video [6.8338761008826445]
PERGAMO is a data-driven approach to learn a deformable model for 3D garments from monocular images.
We first introduce a novel method to reconstruct the 3D geometry of garments from a single image, and use it to build a dataset of clothing from monocular videos.
We show that our method is capable of producing garment animations that match the real-world behaviour, and generalizes to unseen body motions extracted from motion capture dataset.
arXiv Detail & Related papers (2022-10-26T21:15:54Z) - Arbitrary Virtual Try-On Network: Characteristics Preservation and
Trade-off between Body and Clothing [85.74977256940855]
We propose an Arbitrary Virtual Try-On Network (AVTON) for all-type clothes.
AVTON can synthesize realistic try-on images by preserving and trading off characteristics of the target clothes and the reference person.
Our approach can achieve better performance compared with the state-of-the-art virtual try-on methods.
arXiv Detail & Related papers (2021-11-24T08:59:56Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - MV-TON: Memory-based Video Virtual Try-on network [49.496817042974456]
We propose a Memory-based Video virtual Try-On Network (MV-TON)
MV-TON seamlessly transfers desired clothes to a target person without using any clothing templates and generates high-resolution realistic videos.
Experimental results show the effectiveness of our method in the video virtual try-on task and its superiority over other existing methods.
arXiv Detail & Related papers (2021-08-17T08:35:23Z) - MonoClothCap: Towards Temporally Coherent Clothing Capture from
Monocular RGB Video [10.679773937444445]
We present a method to capture temporally coherent dynamic clothing deformation from a monocular RGB video input.
We build statistical deformation models for three types of clothing: T-shirt, short pants and long pants.
Our method produces temporally coherent reconstruction of body and clothing from monocular video.
arXiv Detail & Related papers (2020-09-22T17:54:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.