Do As I Do: Transferring Human Motion and Appearance between Monocular
Videos with Spatial and Temporal Constraints
- URL: http://arxiv.org/abs/2001.02606v2
- Date: Tue, 21 Jan 2020 17:26:48 GMT
- Title: Do As I Do: Transferring Human Motion and Appearance between Monocular
Videos with Spatial and Temporal Constraints
- Authors: Thiago L. Gomes and Renato Martins and Jo\~ao Ferreira and Erickson R.
Nascimento
- Abstract summary: Marker-less human motion estimation and shape modeling from images in the wild bring this challenge to the fore.
We propose a unifying formulation for transferring appearance and human motion from monocular videos.
Our method is able to transfer both human motion and appearance outperforming state-of-the-art methods.
- Score: 8.784162652042959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Creating plausible virtual actors from images of real actors remains one of
the key challenges in computer vision and computer graphics. Marker-less human
motion estimation and shape modeling from images in the wild bring this
challenge to the fore. Although the recent advances on view synthesis and
image-to-image translation, currently available formulations are limited to
transfer solely style and do not take into account the character's motion and
shape, which are by nature intermingled to produce plausible human forms. In
this paper, we propose a unifying formulation for transferring appearance and
retargeting human motion from monocular videos that regards all these aspects.
Our method synthesizes new videos of people in a different context where they
were initially recorded. Differently from recent appearance transferring
methods, our approach takes into account body shape, appearance, and motion
constraints. The evaluation is performed with several experiments using
publicly available real videos containing hard conditions. Our method is able
to transfer both human motion and appearance outperforming state-of-the-art
methods, while preserving specific features of the motion that must be
maintained (e.g., feet touching the floor, hands touching a particular object)
and holding the best visual quality and appearance metrics such as Structural
Similarity (SSIM) and Learned Perceptual Image Patch Similarity (LPIPS).
Related papers
- DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors [63.43133768897087]
We propose a method to convert open-domain images into animated videos.
The key idea is to utilize the motion prior to text-to-video diffusion models by incorporating the image into the generative process as guidance.
Our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image.
arXiv Detail & Related papers (2023-10-18T14:42:16Z) - Learning Motion-Dependent Appearance for High-Fidelity Rendering of
Dynamic Humans from a Single Camera [49.357174195542854]
A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations.
We show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.
arXiv Detail & Related papers (2022-03-24T00:22:03Z) - Action2video: Generating Videos of Human 3D Actions [31.665831044217363]
We aim to tackle the interesting yet challenging problem of generating videos of diverse and natural human motions from prescribed action categories.
Key issue lies in the ability to synthesize multiple distinct motion sequences that are realistic in their visual appearances.
Action2motionally generates plausible 3D pose sequences of a prescribed action category, which are processed and rendered by motion2video to form 2D videos.
arXiv Detail & Related papers (2021-11-12T20:20:37Z) - Dance In the Wild: Monocular Human Animation with Neural Dynamic
Appearance Synthesis [56.550999933048075]
We propose a video based synthesis method that tackles challenges and demonstrates high quality results for in-the-wild videos.
We introduce a novel motion signature that is used to modulate the generator weights to capture dynamic appearance changes.
We evaluate our method on a set of challenging videos and show that our approach achieves state-of-the art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-11-10T20:18:57Z) - On Development and Evaluation of Retargeting Human Motion and Appearance
in Monocular Videos [2.870762512009438]
Transferring human motion and appearance between videos of human actors remains one of the key challenges in Computer Vision.
We propose a novel and high-performant approach based on a hybrid image-based rendering technique that exhibits competitive visual quality.
We also present a new video benchmark dataset composed of different videos with annotated human motions to evaluate the task of synthesizing people's videos.
arXiv Detail & Related papers (2021-03-29T13:17:41Z) - Style and Pose Control for Image Synthesis of Humans from a Single
Monocular View [78.6284090004218]
StylePoseGAN is a non-controllable generator to accept conditioning of pose and appearance separately.
Our network can be trained in a fully supervised way with human images to disentangle pose, appearance and body parts.
StylePoseGAN achieves state-of-the-art image generation fidelity on common perceptual metrics.
arXiv Detail & Related papers (2021-02-22T18:50:47Z) - High-Fidelity Neural Human Motion Transfer from Monocular Video [71.75576402562247]
Video-based human motion transfer creates video animations of humans following a source motion.
We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations.
In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
arXiv Detail & Related papers (2020-12-20T16:54:38Z) - Pose-Guided Human Animation from a Single Image in the Wild [83.86903892201656]
We present a new pose transfer method for synthesizing a human animation from a single image of a person controlled by a sequence of body poses.
Existing pose transfer methods exhibit significant visual artifacts when applying to a novel scene.
We design a compositional neural network that predicts the silhouette, garment labels, and textures.
We are able to synthesize human animations that can preserve the identity and appearance of the person in a temporally coherent way without any fine-tuning of the network on the testing scene.
arXiv Detail & Related papers (2020-12-07T15:38:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.