High-Fidelity Neural Human Motion Transfer from Monocular Video
- URL: http://arxiv.org/abs/2012.10974v1
- Date: Sun, 20 Dec 2020 16:54:38 GMT
- Title: High-Fidelity Neural Human Motion Transfer from Monocular Video
- Authors: Moritz Kappel and Vladislav Golyanik and Mohamed Elgharib and Jann-Ole
Henningson and Hans-Peter Seidel and Susana Castillo and Christian Theobalt
and Marcus Magnor
- Abstract summary: Video-based human motion transfer creates video animations of humans following a source motion.
We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations.
In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
- Score: 71.75576402562247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video-based human motion transfer creates video animations of humans
following a source motion. Current methods show remarkable results for
tightly-clad subjects. However, the lack of temporally consistent handling of
plausible clothing dynamics, including fine and high-frequency details,
significantly limits the attainable visual quality. We address these
limitations for the first time in the literature and present a new framework
which performs high-fidelity and temporally-consistent human motion transfer
with natural pose-dependent non-rigid deformations, for several types of loose
garments. In contrast to the previous techniques, we perform image generation
in three subsequent stages, synthesizing human shape, structure, and
appearance. Given a monocular RGB video of an actor, we train a stack of
recurrent deep neural networks that generate these intermediate representations
from 2D poses and their temporal derivatives. Splitting the difficult motion
transfer problem into subtasks that are aware of the temporal motion context
helps us to synthesize results with plausible dynamics and pose-dependent
detail. It also allows artistic control of results by manipulation of
individual framework stages. In the experimental results, we significantly
outperform the state-of-the-art in terms of video realism. Our code and data
will be made publicly available.
Related papers
- Machine Learning Modeling for Multi-order Human Visual Motion Processing [5.043066132820344]
This research aims to develop machines that learn to perceive visual motion as do humans.
Our model architecture mimics the cortical V1-MT motion processing pathway.
We trained our dual-pathway model on novel motion datasets with varying material properties of moving objects.
arXiv Detail & Related papers (2025-01-22T11:41:41Z) - Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism [52.9091817868613]
Video try-on is a promising area for its tremendous real-world potential.
Previous research has primarily focused on transferring product clothing images to videos with simple human poses.
We propose a novel video try-on framework based on Diffusion Transformer(DiT), named Dynamic Try-On.
arXiv Detail & Related papers (2024-12-13T03:20:53Z) - Do As I Do: Pose Guided Human Motion Copy [39.40271266234068]
Motion copy is an intriguing yet challenging task in artificial intelligence and computer vision.
Existing approaches typically adopt a conventional GAN with an L1 or L2 loss to produce the target fake video.
We present an episodic memory module in the pose-to-appearance generation to propel continuous learning.
Our method significantly outperforms state-of-the-art approaches and gains 7.2% and 12.4% improvements in PSNR and FID respectively.
arXiv Detail & Related papers (2024-06-24T12:41:51Z) - MotionBERT: A Unified Perspective on Learning Human Motion
Representations [46.67364057245364]
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources.
We propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations.
We implement motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network.
arXiv Detail & Related papers (2022-10-12T19:46:25Z) - Learning Motion-Dependent Appearance for High-Fidelity Rendering of
Dynamic Humans from a Single Camera [49.357174195542854]
A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations.
We show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.
arXiv Detail & Related papers (2022-03-24T00:22:03Z) - Dance In the Wild: Monocular Human Animation with Neural Dynamic
Appearance Synthesis [56.550999933048075]
We propose a video based synthesis method that tackles challenges and demonstrates high quality results for in-the-wild videos.
We introduce a novel motion signature that is used to modulate the generator weights to capture dynamic appearance changes.
We evaluate our method on a set of challenging videos and show that our approach achieves state-of-the art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-11-10T20:18:57Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - Pose-Guided Human Animation from a Single Image in the Wild [83.86903892201656]
We present a new pose transfer method for synthesizing a human animation from a single image of a person controlled by a sequence of body poses.
Existing pose transfer methods exhibit significant visual artifacts when applying to a novel scene.
We design a compositional neural network that predicts the silhouette, garment labels, and textures.
We are able to synthesize human animations that can preserve the identity and appearance of the person in a temporally coherent way without any fine-tuning of the network on the testing scene.
arXiv Detail & Related papers (2020-12-07T15:38:29Z) - Do As I Do: Transferring Human Motion and Appearance between Monocular
Videos with Spatial and Temporal Constraints [8.784162652042959]
Marker-less human motion estimation and shape modeling from images in the wild bring this challenge to the fore.
We propose a unifying formulation for transferring appearance and human motion from monocular videos.
Our method is able to transfer both human motion and appearance outperforming state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T16:39:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.