Human Motion Transfer from Poses in the Wild
- URL: http://arxiv.org/abs/2004.03142v1
- Date: Tue, 7 Apr 2020 05:59:53 GMT
- Title: Human Motion Transfer from Poses in the Wild
- Authors: Jian Ren, Menglei Chai, Sergey Tulyakov, Chen Fang, Xiaohui Shen,
Jianchao Yang
- Abstract summary: We tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video.
It is a video-to-video translation task in which the estimated poses are used to bridge two domains.
We introduce a novel pose-to-video translation framework for generating high-quality videos that are temporally coherent even for in-the-wild pose sequences unseen during training.
- Score: 61.6016458288803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we tackle the problem of human motion transfer, where we
synthesize novel motion video for a target person that imitates the movement
from a reference video. It is a video-to-video translation task in which the
estimated poses are used to bridge two domains. Despite substantial progress on
the topic, there exist several problems with the previous methods. First, there
is a domain gap between training and testing pose sequences--the model is
tested on poses it has not seen during training, such as difficult dancing
moves. Furthermore, pose detection errors are inevitable, making the job of the
generator harder. Finally, generating realistic pixels from sparse poses is
challenging in a single step. To address these challenges, we introduce a novel
pose-to-video translation framework for generating high-quality videos that are
temporally coherent even for in-the-wild pose sequences unseen during training.
We propose a pose augmentation method to minimize the training-test gap, a
unified paired and unpaired learning strategy to improve the robustness to
detection errors, and two-stage network architecture to achieve superior
texture quality. To further boost research on the topic, we build two human
motion datasets. Finally, we show the superiority of our approach over the
state-of-the-art studies through extensive experiments and evaluations on
different datasets.
Related papers
- Do As I Do: Pose Guided Human Motion Copy [39.40271266234068]
Motion copy is an intriguing yet challenging task in artificial intelligence and computer vision.
Existing approaches typically adopt a conventional GAN with an L1 or L2 loss to produce the target fake video.
We present an episodic memory module in the pose-to-appearance generation to propel continuous learning.
Our method significantly outperforms state-of-the-art approaches and gains 7.2% and 12.4% improvements in PSNR and FID respectively.
arXiv Detail & Related papers (2024-06-24T12:41:51Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - On Development and Evaluation of Retargeting Human Motion and Appearance
in Monocular Videos [2.870762512009438]
Transferring human motion and appearance between videos of human actors remains one of the key challenges in Computer Vision.
We propose a novel and high-performant approach based on a hybrid image-based rendering technique that exhibits competitive visual quality.
We also present a new video benchmark dataset composed of different videos with annotated human motions to evaluate the task of synthesizing people's videos.
arXiv Detail & Related papers (2021-03-29T13:17:41Z) - Deep Dual Consecutive Network for Human Pose Estimation [44.41818683253614]
We propose a novel multi-frame human pose estimation framework, leveraging abundant temporal cues between video frames to facilitate keypoint detection.
Our method ranks No.1 in the Multi-frame Person Pose Challenge Challenge on the large-scale benchmark datasets PoseTrack 2017 and PoseTrack 2018.
arXiv Detail & Related papers (2021-03-12T13:11:27Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - High-Fidelity Neural Human Motion Transfer from Monocular Video [71.75576402562247]
Video-based human motion transfer creates video animations of humans following a source motion.
We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations.
In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
arXiv Detail & Related papers (2020-12-20T16:54:38Z) - Single-Shot Freestyle Dance Reenactment [89.91619150027265]
The task of motion transfer between a source dancer and a target person is a special case of the pose transfer problem.
We propose a novel method that can reanimate a single image by arbitrary video sequences, unseen during training.
arXiv Detail & Related papers (2020-12-02T12:57:43Z) - Towards Accurate Human Pose Estimation in Videos of Crowded Scenes [134.60638597115872]
We focus on improving human pose estimation in videos of crowded scenes from the perspectives of exploiting temporal context and collecting new data.
For one frame, we forward the historical poses from the previous frames and backward the future poses from the subsequent frames to current frame, leading to stable and accurate human pose estimation in videos.
In this way, our model achieves best performance on 7 out of 13 videos and 56.33 average w_AP on test dataset of HIE challenge.
arXiv Detail & Related papers (2020-10-16T13:19:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.