Related papers: Human Motion Transfer from Poses in the Wild

Human Motion Transfer from Poses in the Wild

URL: http://arxiv.org/abs/2004.03142v1
Date: Tue, 7 Apr 2020 05:59:53 GMT
Title: Human Motion Transfer from Poses in the Wild
Authors: Jian Ren, Menglei Chai, Sergey Tulyakov, Chen Fang, Xiaohui Shen, Jianchao Yang
Abstract summary: We tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video. It is a video-to-video translation task in which the estimated poses are used to bridge two domains. We introduce a novel pose-to-video translation framework for generating high-quality videos that are temporally coherent even for in-the-wild pose sequences unseen during training.
Score: 61.6016458288803
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video. It is a video-to-video translation task in which the estimated poses are used to bridge two domains. Despite substantial progress on the topic, there exist several problems with the previous methods. First, there is a domain gap between training and testing pose sequences--the model is tested on poses it has not seen during training, such as difficult dancing moves. Furthermore, pose detection errors are inevitable, making the job of the generator harder. Finally, generating realistic pixels from sparse poses is challenging in a single step. To address these challenges, we introduce a novel pose-to-video translation framework for generating high-quality videos that are temporally coherent even for in-the-wild pose sequences unseen during training. We propose a pose augmentation method to minimize the training-test gap, a unified paired and unpaired learning strategy to improve the robustness to detection errors, and two-stage network architecture to achieve superior texture quality. To further boost research on the topic, we build two human motion datasets. Finally, we show the superiority of our approach over the state-of-the-art studies through extensive experiments and evaluations on different datasets.

Related papers

Dynamic View Synthesis from Small Camera Motion Videos [56.359460602781304]
We present a novel view synthesis for dynamic $3$D scenes based on distribution-based depth regularization.<n>We also introduce constraints that enforce the volume density of spatial points before the object boundary along the ray to be near zero, ensuring that our model learns the correct geometry of the scene.<n>We conduct extensive experiments to demonstrate the effectiveness of our approach in representing scenes with small camera motion input, and our results compare favorably to state-of-the-art methods.
arXiv Detail & Related papers (2025-06-29T09:17:55Z)
Towards Robust and Realistic Human Pose Estimation via WiFi Signals [85.60557095666934]
WiFi-based human pose estimation is a challenging task that bridges discrete and subtle WiFi signals to human skeletons. This paper revisits this problem and reveals two critical yet overlooked issues: 1) cross-domain gap, i.e., due to significant variations between source-target domain pose distributions; and 2) structural fidelity gap, i.e., predicted skeletal poses manifest distorted topology. This paper fills these gaps by reformulating the task into a novel two-phase framework dubbed DT-Pose: Domain-consistent representation learning and Topology-constrained Pose decoding.
arXiv Detail & Related papers (2025-01-16T09:38:22Z)
Move-in-2D: 2D-Conditioned Human Motion Generation [54.067588636155115]
We propose Move-in-2D, a novel approach to generate human motion sequences conditioned on a scene image. Our approach accepts both a scene image and text prompt as inputs, producing a motion sequence tailored to the scene.
arXiv Detail & Related papers (2024-12-17T18:58:07Z)
One-shot Human Motion Transfer via Occlusion-Robust Flow Prediction and Neural Texturing [21.613055849276385]
We propose a unified framework that combines multi-scale feature warping and neural texture mapping to recover better 2D appearance and 2.5D geometry. Our model takes advantage of multiple modalities by jointly training and fusing them, which allows it to robust neural texture features that cope with geometric errors.
arXiv Detail & Related papers (2024-12-09T03:14:40Z)
Do As I Do: Pose Guided Human Motion Copy [39.40271266234068]
Motion copy is an intriguing yet challenging task in artificial intelligence and computer vision. Existing approaches typically adopt a conventional GAN with an L1 or L2 loss to produce the target fake video. We present an episodic memory module in the pose-to-appearance generation to propel continuous learning. Our method significantly outperforms state-of-the-art approaches and gains 7.2% and 12.4% improvements in PSNR and FID respectively.
arXiv Detail & Related papers (2024-06-24T12:41:51Z)
Render In-between: Motion Guided Video Synthesis for Action Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance. A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset. Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z)
On Development and Evaluation of Retargeting Human Motion and Appearance in Monocular Videos [2.870762512009438]
Transferring human motion and appearance between videos of human actors remains one of the key challenges in Computer Vision. We propose a novel and high-performant approach based on a hybrid image-based rendering technique that exhibits competitive visual quality. We also present a new video benchmark dataset composed of different videos with annotated human motions to evaluate the task of synthesizing people's videos.
arXiv Detail & Related papers (2021-03-29T13:17:41Z)
Deep Dual Consecutive Network for Human Pose Estimation [44.41818683253614]
We propose a novel multi-frame human pose estimation framework, leveraging abundant temporal cues between video frames to facilitate keypoint detection. Our method ranks No.1 in the Multi-frame Person Pose Challenge Challenge on the large-scale benchmark datasets PoseTrack 2017 and PoseTrack 2018.
arXiv Detail & Related papers (2021-03-12T13:11:27Z)
Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query. Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories. We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z)
High-Fidelity Neural Human Motion Transfer from Monocular Video [71.75576402562247]
Video-based human motion transfer creates video animations of humans following a source motion. We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations. In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
arXiv Detail & Related papers (2020-12-20T16:54:38Z)
Single-Shot Freestyle Dance Reenactment [89.91619150027265]
The task of motion transfer between a source dancer and a target person is a special case of the pose transfer problem. We propose a novel method that can reanimate a single image by arbitrary video sequences, unseen during training.
arXiv Detail & Related papers (2020-12-02T12:57:43Z)
Towards Accurate Human Pose Estimation in Videos of Crowded Scenes [134.60638597115872]
We focus on improving human pose estimation in videos of crowded scenes from the perspectives of exploiting temporal context and collecting new data. For one frame, we forward the historical poses from the previous frames and backward the future poses from the subsequent frames to current frame, leading to stable and accurate human pose estimation in videos. In this way, our model achieves best performance on 7 out of 13 videos and 56.33 average w_AP on test dataset of HIE challenge.
arXiv Detail & Related papers (2020-10-16T13:19:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.