Related papers: High-Fidelity Neural Human Motion Transfer from Monocular Video

High-Fidelity Neural Human Motion Transfer from Monocular Video

URL: http://arxiv.org/abs/2012.10974v1
Date: Sun, 20 Dec 2020 16:54:38 GMT
Title: High-Fidelity Neural Human Motion Transfer from Monocular Video
Authors: Moritz Kappel and Vladislav Golyanik and Mohamed Elgharib and Jann-Ole Henningson and Hans-Peter Seidel and Susana Castillo and Christian Theobalt and Marcus Magnor
Abstract summary: Video-based human motion transfer creates video animations of humans following a source motion. We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations. In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
Score: 71.75576402562247
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video-based human motion transfer creates video animations of humans following a source motion. Current methods show remarkable results for tightly-clad subjects. However, the lack of temporally consistent handling of plausible clothing dynamics, including fine and high-frequency details, significantly limits the attainable visual quality. We address these limitations for the first time in the literature and present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations, for several types of loose garments. In contrast to the previous techniques, we perform image generation in three subsequent stages, synthesizing human shape, structure, and appearance. Given a monocular RGB video of an actor, we train a stack of recurrent deep neural networks that generate these intermediate representations from 2D poses and their temporal derivatives. Splitting the difficult motion transfer problem into subtasks that are aware of the temporal motion context helps us to synthesize results with plausible dynamics and pose-dependent detail. It also allows artistic control of results by manipulation of individual framework stages. In the experimental results, we significantly outperform the state-of-the-art in terms of video realism. Our code and data will be made publicly available.

Related papers

LatentMove: Towards Complex Human Movement Video Generation [35.83863053692456]
We present LatentMove, a DiT-based framework specifically tailored for highly dynamic human animation.<n>Our architecture incorporates a conditional control branch and learnable face/body tokens to preserve consistency as well as fine-grained details across frames.<n>We introduce Complex-Human-Videos (CHV), a dataset featuring diverse, challenging human motions designed to benchmark the robustness of I2V systems.
arXiv Detail & Related papers (2025-05-28T07:10:49Z)
Machine Learning Modeling for Multi-order Human Visual Motion Processing [5.043066132820344]
This research aims to develop machines that learn to perceive visual motion as do humans. Our model architecture mimics the cortical V1-MT motion processing pathway. We trained our dual-pathway model on novel motion datasets with varying material properties of moving objects.
arXiv Detail & Related papers (2025-01-22T11:41:41Z)
Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism [52.9091817868613]
Video try-on is a promising area for its tremendous real-world potential. Previous research has primarily focused on transferring product clothing images to videos with simple human poses. We propose a novel video try-on framework based on Diffusion Transformer(DiT), named Dynamic Try-On.
arXiv Detail & Related papers (2024-12-13T03:20:53Z)
Do As I Do: Pose Guided Human Motion Copy [39.40271266234068]
Motion copy is an intriguing yet challenging task in artificial intelligence and computer vision. Existing approaches typically adopt a conventional GAN with an L1 or L2 loss to produce the target fake video. We present an episodic memory module in the pose-to-appearance generation to propel continuous learning. Our method significantly outperforms state-of-the-art approaches and gains 7.2% and 12.4% improvements in PSNR and FID respectively.
arXiv Detail & Related papers (2024-06-24T12:41:51Z)
MotionBERT: A Unified Perspective on Learning Human Motion Representations [46.67364057245364]
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources. We propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations. We implement motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network.
arXiv Detail & Related papers (2022-10-12T19:46:25Z)
Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera [49.357174195542854]
A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations. We show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.
arXiv Detail & Related papers (2022-03-24T00:22:03Z)
Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis [56.550999933048075]
We propose a video based synthesis method that tackles challenges and demonstrates high quality results for in-the-wild videos. We introduce a novel motion signature that is used to modulate the generator weights to capture dynamic appearance changes. We evaluate our method on a set of challenging videos and show that our approach achieves state-of-the art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-11-10T20:18:57Z)
Render In-between: Motion Guided Video Synthesis for Action Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance. A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset. Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z)
Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes [27.443701512923177]
We propose to bridge human motion synthesis and scene affordance reasoning. We present a hierarchical generative framework to synthesize long-term 3D human motion conditioning on the 3D scene structure. Our experiments show significant improvements over previous approaches on generating natural and physically plausible human motion in a scene.
arXiv Detail & Related papers (2020-12-10T09:09:38Z)
Pose-Guided Human Animation from a Single Image in the Wild [83.86903892201656]
We present a new pose transfer method for synthesizing a human animation from a single image of a person controlled by a sequence of body poses. Existing pose transfer methods exhibit significant visual artifacts when applying to a novel scene. We design a compositional neural network that predicts the silhouette, garment labels, and textures. We are able to synthesize human animations that can preserve the identity and appearance of the person in a temporally coherent way without any fine-tuning of the network on the testing scene.
arXiv Detail & Related papers (2020-12-07T15:38:29Z)
Human Motion Transfer from Poses in the Wild [61.6016458288803]
We tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video. It is a video-to-video translation task in which the estimated poses are used to bridge two domains. We introduce a novel pose-to-video translation framework for generating high-quality videos that are temporally coherent even for in-the-wild pose sequences unseen during training.
arXiv Detail & Related papers (2020-04-07T05:59:53Z)
Do As I Do: Transferring Human Motion and Appearance between Monocular Videos with Spatial and Temporal Constraints [8.784162652042959]
Marker-less human motion estimation and shape modeling from images in the wild bring this challenge to the fore. We propose a unifying formulation for transferring appearance and human motion from monocular videos. Our method is able to transfer both human motion and appearance outperforming state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T16:39:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.