NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same
Action
- URL: http://arxiv.org/abs/2212.13660v1
- Date: Wed, 28 Dec 2022 01:40:32 GMT
- Title: NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same
Action
- Authors: Kuan-Chieh Wang, Zhenzhen Weng, Maria Xenochristou, Joao Pedro Araujo,
Jeffrey Gu, C. Karen Liu, Serena Yeung
- Abstract summary: We introduce the Neural Motion (NeMo) field to represent the underlying 3D motions across a set of videos of the same action.
NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection.
- Score: 24.67958500694608
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of reconstructing 3D human motion has wideranging applications. The
gold standard Motion capture (MoCap) systems are accurate but inaccessible to
the general public due to their cost, hardware and space constraints. In
contrast, monocular human mesh recovery (HMR) methods are much more accessible
than MoCap as they take single-view videos as inputs. Replacing the multi-view
Mo- Cap systems with a monocular HMR method would break the current barriers to
collecting accurate 3D motion thus making exciting applications like motion
analysis and motiondriven animation accessible to the general public. However,
performance of existing HMR methods degrade when the video contains challenging
and dynamic motion that is not in existing MoCap datasets used for training.
This reduces its appeal as dynamic motion is frequently the target in 3D motion
recovery in the aforementioned applications. Our study aims to bridge the gap
between monocular HMR and multi-view MoCap systems by leveraging information
shared across multiple video instances of the same action. We introduce the
Neural Motion (NeMo) field. It is optimized to represent the underlying 3D
motions across a set of videos of the same action. Empirically, we show that
NeMo can recover 3D motion in sports using videos from the Penn Action dataset,
where NeMo outperforms existing HMR methods in terms of 2D keypoint detection.
To further validate NeMo using 3D metrics, we collected a small MoCap dataset
mimicking actions in Penn Action,and show that NeMo achieves better 3D
reconstruction compared to various baselines.
Related papers
- A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions [56.709280823844374]
We introduce a mask-based motion correction module (MCM) that leverages motion context and video mask to repair flawed motions.
We also propose a physics-based motion transfer module (PTM), which employs a pretrain and adapt approach for motion imitation.
Our approach is designed as a plug-and-play module to physically refine the video motion capture results, including high-difficulty in-the-wild motions.
arXiv Detail & Related papers (2024-12-23T08:26:00Z) - Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera [49.82535393220003]
Dyn-HaMR is the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild.
We show that our approach significantly outperforms state-of-the-art methods in terms of 4D global mesh recovery.
This establishes a new benchmark for hand motion reconstruction from monocular video with moving cameras.
arXiv Detail & Related papers (2024-12-17T12:43:10Z) - MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting [56.785233997533794]
We propose a novel deformable 3D Gaussian splatting framework called MotionGS.
MotionGS explores explicit motion priors to guide the deformation of 3D Gaussians.
Experiments in the monocular dynamic scenes validate that MotionGS surpasses state-of-the-art methods.
arXiv Detail & Related papers (2024-10-10T08:19:47Z) - ViMo: Generating Motions from Casual Videos [34.19904765033005]
We propose a novel Video-to-Motion-Generation framework (ViMo)
ViMo could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions.
Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist.
arXiv Detail & Related papers (2024-08-13T03:57:35Z) - Delving into Motion-Aware Matching for Monocular 3D Object Tracking [81.68608983602581]
We find that the motion cue of objects along different time frames is critical in 3D multi-object tracking.
We propose MoMA-M3T, a framework that mainly consists of three motion-aware components.
We conduct extensive experiments on the nuScenes and KITTI datasets to demonstrate our MoMA-M3T achieves competitive performance against state-of-the-art methods.
arXiv Detail & Related papers (2023-08-22T17:53:58Z) - HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations [7.096701481970196]
Head-Mounted Devices (HMDs) typically only provide a few input signals, such as head and hands 6-DoF.
We propose the first unified approach, HMD-NeMo, that addresses plausible and accurate full body motion generation even when the hands may be only partially visible.
arXiv Detail & Related papers (2023-08-22T08:07:12Z) - Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape
Estimation from Monocular Video [24.217269857183233]
We propose a motion pose and shape network (MPS-Net) to capture humans in motion to estimate 3D human pose and shape from a video.
Specifically, we first propose a motion continuity attention (MoCA) module that leverages visual cues observed from human motion to adaptively recalibrate the range that needs attention in the sequence.
By coupling the MoCA and HAFI modules, the proposed MPS-Net excels in estimating 3D human pose and shape in the video.
arXiv Detail & Related papers (2022-03-16T11:00:24Z) - MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks [77.56526918859345]
We present a novel framework that brings the 3D motion task from controlled environments to in-the-wild scenarios.
It is capable of body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure.
arXiv Detail & Related papers (2021-12-19T07:52:05Z) - SportsCap: Monocular 3D Human Motion Capture and Fine-grained
Understanding in Challenging Sports Videos [40.19723456533343]
We propose SportsCap -- the first approach for simultaneously capturing 3D human motions and understanding fine-grained actions from monocular challenging sports video input.
Our approach utilizes the semantic and temporally structured sub-motion prior in the embedding space for motion capture and understanding.
Based on such hybrid motion information, we introduce a multi-stream spatial-temporal Graph Convolutional Network(ST-GCN) to predict the fine-grained semantic action attributes.
arXiv Detail & Related papers (2021-04-23T07:52:03Z) - Motion Guided 3D Pose Estimation from Videos [81.14443206968444]
We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose.
In computing motion loss, a simple yet effective representation for keypoint motion, called pairwise motion encoding, is introduced.
We design a new graph convolutional network architecture, U-shaped GCN (UGCN), which captures both short-term and long-term motion information.
arXiv Detail & Related papers (2020-04-29T06:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.