Action2video: Generating Videos of Human 3D Actions
- URL: http://arxiv.org/abs/2111.06925v1
- Date: Fri, 12 Nov 2021 20:20:37 GMT
- Title: Action2video: Generating Videos of Human 3D Actions
- Authors: Chuan Guo, Xinxin Zuo, Sen Wang, Xinshuang Liu, Shihao Zou, Minglun
Gong, Li Cheng
- Abstract summary: We aim to tackle the interesting yet challenging problem of generating videos of diverse and natural human motions from prescribed action categories.
Key issue lies in the ability to synthesize multiple distinct motion sequences that are realistic in their visual appearances.
Action2motionally generates plausible 3D pose sequences of a prescribed action category, which are processed and rendered by motion2video to form 2D videos.
- Score: 31.665831044217363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We aim to tackle the interesting yet challenging problem of generating videos
of diverse and natural human motions from prescribed action categories. The key
issue lies in the ability to synthesize multiple distinct motion sequences that
are realistic in their visual appearances. It is achieved in this paper by a
two-step process that maintains internal 3D pose and shape representations,
action2motion and motion2video. Action2motion stochastically generates
plausible 3D pose sequences of a prescribed action category, which are
processed and rendered by motion2video to form 2D videos. Specifically, the Lie
algebraic theory is engaged in representing natural human motions following the
physical law of human kinematics; a temporal variational auto-encoder (VAE) is
developed that encourages diversity of output motions. Moreover, given an
additional input image of a clothed human character, an entire pipeline is
proposed to extract his/her 3D detailed shape, and to render in videos the
plausible motions from different views. This is realized by improving existing
methods to extract 3D human shapes and textures from single 2D images, rigging,
animating, and rendering to form 2D videos of human motions. It also
necessitates the curation and reannotation of 3D human motion datasets for
training purpose. Thorough empirical experiments including ablation study,
qualitative and quantitative evaluations manifest the applicability of our
approach, and demonstrate its competitiveness in addressing related tasks,
where components of our approach are compared favorably to the
state-of-the-arts.
Related papers
- Learning Human Motion from Monocular Videos via Cross-Modal Manifold Alignment [45.74813582690906]
Learning 3D human motion from 2D inputs is a fundamental task in the realms of computer vision and computer graphics.
We present the Video-to-Motion Generator (VTM), which leverages motion priors through cross-modal latent feature space alignment.
The VTM showcases state-of-the-art performance in reconstructing 3D human motion from monocular videos.
arXiv Detail & Related papers (2024-04-15T06:38:09Z) - Cinematic Behavior Transfer via NeRF-based Differentiable Filming [63.1622492808519]
Existing SLAM methods face limitations in dynamic scenes and human pose estimation often focuses on 2D projections.
We first introduce a reverse filming behavior estimation technique.
We then introduce a cinematic transfer pipeline that is able to transfer various shot types to a new 2D video or a 3D virtual environment.
arXiv Detail & Related papers (2023-11-29T15:56:58Z) - 3D Cinemagraphy from a Single Image [73.09720823592092]
We present 3D Cinemagraphy, a new technique that marries 2D image animation with 3D photography.
Given a single still image as input, our goal is to generate a video that contains both visual content animation and camera motion.
arXiv Detail & Related papers (2023-03-10T06:08:23Z) - Physically Plausible Animation of Human Upper Body from a Single Image [41.027391105867345]
We present a new method for generating controllable, dynamically responsive, and photorealistic human animations.
Given an image of a person, our system allows the user to generate Physically plausible Upper Body Animation (PUBA) using interaction in the image space.
arXiv Detail & Related papers (2022-12-09T09:36:59Z) - MotionBERT: A Unified Perspective on Learning Human Motion
Representations [46.67364057245364]
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources.
We propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations.
We implement motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network.
arXiv Detail & Related papers (2022-10-12T19:46:25Z) - Self-Supervised 3D Human Pose Estimation in Static Video Via Neural
Rendering [5.568218439349004]
Inferring 3D human pose from 2D images is a challenging and long-standing problem in the field of computer vision.
We present preliminary results for a method to estimate 3D pose from 2D video containing a single person.
arXiv Detail & Related papers (2022-10-10T09:24:07Z) - Learning Motion-Dependent Appearance for High-Fidelity Rendering of
Dynamic Humans from a Single Camera [49.357174195542854]
A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations.
We show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.
arXiv Detail & Related papers (2022-03-24T00:22:03Z) - MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks [77.56526918859345]
We present a novel framework that brings the 3D motion task from controlled environments to in-the-wild scenarios.
It is capable of body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure.
arXiv Detail & Related papers (2021-12-19T07:52:05Z) - Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z) - Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors.
We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.