REMOT: A Region-to-Whole Framework for Realistic Human Motion Transfer
- URL: http://arxiv.org/abs/2209.00475v1
- Date: Thu, 1 Sep 2022 14:03:51 GMT
- Title: REMOT: A Region-to-Whole Framework for Realistic Human Motion Transfer
- Authors: Quanwei Yang, Xinchen Liu, Wu Liu, Hongtao Xie, Xiaoyan Gu, Lingyun
Yu, Yongdong Zhang
- Abstract summary: Human Video Motion Transfer (HVMT) aims to, given an image of a source person, generate his/her video that imitates the motion of the driving person.
Existing methods for HVMT mainly exploit Generative Adversarial Networks (GANs) to perform the warping operation.
This paper presents a novel REgionto-whole human MOtion Transfer framework based on GANs.
- Score: 96.64111294772141
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human Video Motion Transfer (HVMT) aims to, given an image of a source
person, generate his/her video that imitates the motion of the driving person.
Existing methods for HVMT mainly exploit Generative Adversarial Networks (GANs)
to perform the warping operation based on the flow estimated from the source
person image and each driving video frame. However, these methods always
generate obvious artifacts due to the dramatic differences in poses, scales,
and shifts between the source person and the driving person. To overcome these
challenges, this paper presents a novel REgionto-whole human MOtion Transfer
(REMOT) framework based on GANs. To generate realistic motions, the REMOT
adopts a progressive generation paradigm: it first generates each body part in
the driving pose without flow-based warping, then composites all parts into a
complete person of the driving motion. Moreover, to preserve the natural global
appearance, we design a Global Alignment Module to align the scale and position
of the source person with those of the driving person based on their layouts.
Furthermore, we propose a Texture Alignment Module to keep each part of the
person aligned according to the similarity of the texture. Finally, through
extensive quantitative and qualitative experiments, our REMOT achieves
state-of-the-art results on two public benchmarks.
Related papers
- Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis.
We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching.
Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z) - Human Motion Diffusion Model [35.05219668478535]
Motion Diffusion Model (MDM) is a transformer-based generative model for the human motion domain.
We show that our model is trained with lightweight resources and yet achieves state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion.
arXiv Detail & Related papers (2022-09-29T16:27:53Z) - Motion and Appearance Adaptation for Cross-Domain Motion Transfer [36.98500700394921]
Motion transfer aims to transfer the motion of a driving video to a source image.
Traditional single domain motion transfer approaches often produce notable artifacts.
We propose a Motion and Appearance Adaptation (MAA) approach for cross-domain motion transfer.
arXiv Detail & Related papers (2022-09-29T03:24:47Z) - SAGA: Stochastic Whole-Body Grasping with Contact [60.43627793243098]
Human grasping synthesis has numerous applications including AR/VR, video games, and robotics.
In this work, our goal is to synthesize whole-body grasping motion. Given a 3D object, we aim to generate diverse and natural whole-body human motions that approach and grasp the object.
arXiv Detail & Related papers (2021-12-19T10:15:30Z) - Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction.
Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z) - Task-Generic Hierarchical Human Motion Prior using VAEs [44.356707509079044]
A deep generative model that describes human motions can benefit a wide range of fundamental computer vision and graphics tasks.
We present a method for learning complex human motions independent of specific tasks using a combined global and local latent space.
We demonstrate the effectiveness of our hierarchical motion variational autoencoder in a variety of tasks including video-based human pose estimation.
arXiv Detail & Related papers (2021-06-07T23:11:42Z) - Scene-aware Generative Network for Human Motion Synthesis [125.21079898942347]
We propose a new framework, with the interaction between the scene and the human motion taken into account.
Considering the uncertainty of human motion, we formulate this task as a generative task.
We derive a GAN based learning approach, with discriminators to enforce the compatibility between the human motion and the contextual scene.
arXiv Detail & Related papers (2021-05-31T09:05:50Z) - Socially and Contextually Aware Human Motion and Pose Forecasting [48.083060946226]
We propose a novel framework to tackle both tasks of human motion (or skeleton pose) and body skeleton pose forecasting.
We consider incorporating both scene and social contexts, as critical clues for this prediction task.
Our proposed framework achieves a superior performance compared to several baselines on two social datasets.
arXiv Detail & Related papers (2020-07-14T06:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.