Related papers: NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action

NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action

URL: http://arxiv.org/abs/2212.13660v1
Date: Wed, 28 Dec 2022 01:40:32 GMT
Title: NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action
Authors: Kuan-Chieh Wang, Zhenzhen Weng, Maria Xenochristou, Joao Pedro Araujo, Jeffrey Gu, C. Karen Liu, Serena Yeung
Abstract summary: We introduce the Neural Motion (NeMo) field to represent the underlying 3D motions across a set of videos of the same action. NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection.
Score: 24.67958500694608
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The task of reconstructing 3D human motion has wideranging applications. The gold standard Motion capture (MoCap) systems are accurate but inaccessible to the general public due to their cost, hardware and space constraints. In contrast, monocular human mesh recovery (HMR) methods are much more accessible than MoCap as they take single-view videos as inputs. Replacing the multi-view Mo- Cap systems with a monocular HMR method would break the current barriers to collecting accurate 3D motion thus making exciting applications like motion analysis and motiondriven animation accessible to the general public. However, performance of existing HMR methods degrade when the video contains challenging and dynamic motion that is not in existing MoCap datasets used for training. This reduces its appeal as dynamic motion is frequently the target in 3D motion recovery in the aforementioned applications. Our study aims to bridge the gap between monocular HMR and multi-view MoCap systems by leveraging information shared across multiple video instances of the same action. We introduce the Neural Motion (NeMo) field. It is optimized to represent the underlying 3D motions across a set of videos of the same action. Empirically, we show that NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection. To further validate NeMo using 3D metrics, we collected a small MoCap dataset mimicking actions in Penn Action,and show that NeMo achieves better 3D reconstruction compared to various baselines.

Related papers

Diffusion Model-based Activity Completion for AI Motion Capture from Videos [2.9271399793140076]
Current AI motion capture methods rely entirely on observed video sequences, similar to conventional motion capture.<n>We propose a diffusion-model-based action completion technique that generates complementary human motion sequences.<n>By introducing a gate module and a position-time embedding module, our approach achieves competitive results on the Human3.6M dataset.
arXiv Detail & Related papers (2025-05-27T05:04:50Z)
A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions [56.709280823844374]
We introduce a mask-based motion correction module (MCM) that leverages motion context and video mask to repair flawed motions. We also propose a physics-based motion transfer module (PTM), which employs a pretrain and adapt approach for motion imitation. Our approach is designed as a plug-and-play module to physically refine the video motion capture results, including high-difficulty in-the-wild motions.
arXiv Detail & Related papers (2024-12-23T08:26:00Z)
Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera [49.82535393220003]
Dyn-HaMR is the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. We show that our approach significantly outperforms state-of-the-art methods in terms of 4D global mesh recovery. This establishes a new benchmark for hand motion reconstruction from monocular video with moving cameras.
arXiv Detail & Related papers (2024-12-17T12:43:10Z)
MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations [85.85596165472663]
We build MotionBank, which comprises 13 video action datasets, 1.24M motion sequences, and 132.9M frames of natural and diverse human motions. Our MotionBank is beneficial for general motion-related tasks of human motion generation, motion in-context generation, and motion understanding.
arXiv Detail & Related papers (2024-10-17T17:31:24Z)
MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting [56.785233997533794]
We propose a novel deformable 3D Gaussian splatting framework called MotionGS. MotionGS explores explicit motion priors to guide the deformation of 3D Gaussians. Experiments in the monocular dynamic scenes validate that MotionGS surpasses state-of-the-art methods.
arXiv Detail & Related papers (2024-10-10T08:19:47Z)
ViMo: Generating Motions from Casual Videos [34.19904765033005]
We propose a novel Video-to-Motion-Generation framework (ViMo) ViMo could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions. Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist.
arXiv Detail & Related papers (2024-08-13T03:57:35Z)
Delving into Motion-Aware Matching for Monocular 3D Object Tracking [81.68608983602581]
We find that the motion cue of objects along different time frames is critical in 3D multi-object tracking. We propose MoMA-M3T, a framework that mainly consists of three motion-aware components. We conduct extensive experiments on the nuScenes and KITTI datasets to demonstrate our MoMA-M3T achieves competitive performance against state-of-the-art methods.
arXiv Detail & Related papers (2023-08-22T17:53:58Z)
HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations [7.096701481970196]
Head-Mounted Devices (HMDs) typically only provide a few input signals, such as head and hands 6-DoF. We propose the first unified approach, HMD-NeMo, that addresses plausible and accurate full body motion generation even when the hands may be only partially visible.
arXiv Detail & Related papers (2023-08-22T08:07:12Z)
D&D: Learning Human Dynamics from Dynamic Camera [55.60512353465175]
We present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the laws of physics to reconstruct 3D human motion from the in-the-wild videos with a moving camera. Our approach is entirely neural-based and runs without offline optimization or simulation in physics engines.
arXiv Detail & Related papers (2022-09-19T06:51:02Z)
Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video [24.217269857183233]
We propose a motion pose and shape network (MPS-Net) to capture humans in motion to estimate 3D human pose and shape from a video. Specifically, we first propose a motion continuity attention (MoCA) module that leverages visual cues observed from human motion to adaptively recalibrate the range that needs attention in the sequence. By coupling the MoCA and HAFI modules, the proposed MPS-Net excels in estimating 3D human pose and shape in the video.
arXiv Detail & Related papers (2022-03-16T11:00:24Z)
MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks [77.56526918859345]
We present a novel framework that brings the 3D motion task from controlled environments to in-the-wild scenarios. It is capable of body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure.
arXiv Detail & Related papers (2021-12-19T07:52:05Z)
SportsCap: Monocular 3D Human Motion Capture and Fine-grained Understanding in Challenging Sports Videos [40.19723456533343]
We propose SportsCap -- the first approach for simultaneously capturing 3D human motions and understanding fine-grained actions from monocular challenging sports video input. Our approach utilizes the semantic and temporally structured sub-motion prior in the embedding space for motion capture and understanding. Based on such hybrid motion information, we introduce a multi-stream spatial-temporal Graph Convolutional Network(ST-GCN) to predict the fine-grained semantic action attributes.
arXiv Detail & Related papers (2021-04-23T07:52:03Z)
Motion Guided 3D Pose Estimation from Videos [81.14443206968444]
We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose. In computing motion loss, a simple yet effective representation for keypoint motion, called pairwise motion encoding, is introduced. We design a new graph convolutional network architecture, U-shaped GCN (UGCN), which captures both short-term and long-term motion information.
arXiv Detail & Related papers (2020-04-29T06:59:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.