NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same
Action
- URL: http://arxiv.org/abs/2212.13660v1
- Date: Wed, 28 Dec 2022 01:40:32 GMT
- Title: NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same
Action
- Authors: Kuan-Chieh Wang, Zhenzhen Weng, Maria Xenochristou, Joao Pedro Araujo,
Jeffrey Gu, C. Karen Liu, Serena Yeung
- Abstract summary: We introduce the Neural Motion (NeMo) field to represent the underlying 3D motions across a set of videos of the same action.
NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection.
- Score: 24.67958500694608
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of reconstructing 3D human motion has wideranging applications. The
gold standard Motion capture (MoCap) systems are accurate but inaccessible to
the general public due to their cost, hardware and space constraints. In
contrast, monocular human mesh recovery (HMR) methods are much more accessible
than MoCap as they take single-view videos as inputs. Replacing the multi-view
Mo- Cap systems with a monocular HMR method would break the current barriers to
collecting accurate 3D motion thus making exciting applications like motion
analysis and motiondriven animation accessible to the general public. However,
performance of existing HMR methods degrade when the video contains challenging
and dynamic motion that is not in existing MoCap datasets used for training.
This reduces its appeal as dynamic motion is frequently the target in 3D motion
recovery in the aforementioned applications. Our study aims to bridge the gap
between monocular HMR and multi-view MoCap systems by leveraging information
shared across multiple video instances of the same action. We introduce the
Neural Motion (NeMo) field. It is optimized to represent the underlying 3D
motions across a set of videos of the same action. Empirically, we show that
NeMo can recover 3D motion in sports using videos from the Penn Action dataset,
where NeMo outperforms existing HMR methods in terms of 2D keypoint detection.
To further validate NeMo using 3D metrics, we collected a small MoCap dataset
mimicking actions in Penn Action,and show that NeMo achieves better 3D
reconstruction compared to various baselines.
Related papers
- MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations [85.85596165472663]
We build MotionBank, which comprises 13 video action datasets, 1.24M motion sequences, and 132.9M frames of natural and diverse human motions.
Our MotionBank is beneficial for general motion-related tasks of human motion generation, motion in-context generation, and motion understanding.
arXiv Detail & Related papers (2024-10-17T17:31:24Z) - MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting [56.785233997533794]
We propose a novel deformable 3D Gaussian splatting framework called MotionGS.
MotionGS explores explicit motion priors to guide the deformation of 3D Gaussians.
Experiments in the monocular dynamic scenes validate that MotionGS surpasses state-of-the-art methods.
arXiv Detail & Related papers (2024-10-10T08:19:47Z) - ViMo: Generating Motions from Casual Videos [34.19904765033005]
We propose a novel Video-to-Motion-Generation framework (ViMo)
ViMo could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions.
Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist.
arXiv Detail & Related papers (2024-08-13T03:57:35Z) - Delving into Motion-Aware Matching for Monocular 3D Object Tracking [81.68608983602581]
We find that the motion cue of objects along different time frames is critical in 3D multi-object tracking.
We propose MoMA-M3T, a framework that mainly consists of three motion-aware components.
We conduct extensive experiments on the nuScenes and KITTI datasets to demonstrate our MoMA-M3T achieves competitive performance against state-of-the-art methods.
arXiv Detail & Related papers (2023-08-22T17:53:58Z) - HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations [7.096701481970196]
Head-Mounted Devices (HMDs) typically only provide a few input signals, such as head and hands 6-DoF.
We propose the first unified approach, HMD-NeMo, that addresses plausible and accurate full body motion generation even when the hands may be only partially visible.
arXiv Detail & Related papers (2023-08-22T08:07:12Z) - D&D: Learning Human Dynamics from Dynamic Camera [55.60512353465175]
We present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the laws of physics to reconstruct 3D human motion from the in-the-wild videos with a moving camera.
Our approach is entirely neural-based and runs without offline optimization or simulation in physics engines.
arXiv Detail & Related papers (2022-09-19T06:51:02Z) - Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape
Estimation from Monocular Video [24.217269857183233]
We propose a motion pose and shape network (MPS-Net) to capture humans in motion to estimate 3D human pose and shape from a video.
Specifically, we first propose a motion continuity attention (MoCA) module that leverages visual cues observed from human motion to adaptively recalibrate the range that needs attention in the sequence.
By coupling the MoCA and HAFI modules, the proposed MPS-Net excels in estimating 3D human pose and shape in the video.
arXiv Detail & Related papers (2022-03-16T11:00:24Z) - MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks [77.56526918859345]
We present a novel framework that brings the 3D motion task from controlled environments to in-the-wild scenarios.
It is capable of body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure.
arXiv Detail & Related papers (2021-12-19T07:52:05Z) - SportsCap: Monocular 3D Human Motion Capture and Fine-grained
Understanding in Challenging Sports Videos [40.19723456533343]
We propose SportsCap -- the first approach for simultaneously capturing 3D human motions and understanding fine-grained actions from monocular challenging sports video input.
Our approach utilizes the semantic and temporally structured sub-motion prior in the embedding space for motion capture and understanding.
Based on such hybrid motion information, we introduce a multi-stream spatial-temporal Graph Convolutional Network(ST-GCN) to predict the fine-grained semantic action attributes.
arXiv Detail & Related papers (2021-04-23T07:52:03Z) - Motion Guided 3D Pose Estimation from Videos [81.14443206968444]
We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose.
In computing motion loss, a simple yet effective representation for keypoint motion, called pairwise motion encoding, is introduced.
We design a new graph convolutional network architecture, U-shaped GCN (UGCN), which captures both short-term and long-term motion information.
arXiv Detail & Related papers (2020-04-29T06:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.