World-Grounded Human Motion Recovery via Gravity-View Coordinates
- URL: http://arxiv.org/abs/2409.06662v1
- Date: Tue, 10 Sep 2024 17:25:47 GMT
- Title: World-Grounded Human Motion Recovery via Gravity-View Coordinates
- Authors: Zehong Shen, Huaijin Pi, Yan Xia, Zhi Cen, Sida Peng, Zechen Hu, Hujun Bao, Ruizhen Hu, Xiaowei Zhou,
- Abstract summary: We propose estimating human poses in a novel Gravity-View coordinate system.
The proposed GV system is naturally gravity-aligned and uniquely defined for each video frame.
Our method recovers more realistic motion in both the camera space and world-grounded settings, outperforming state-of-the-art methods in both accuracy and speed.
- Score: 60.618543026949226
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a novel method for recovering world-grounded human motion from monocular video. The main challenge lies in the ambiguity of defining the world coordinate system, which varies between sequences. Previous approaches attempt to alleviate this issue by predicting relative motion in an autoregressive manner, but are prone to accumulating errors. Instead, we propose estimating human poses in a novel Gravity-View (GV) coordinate system, which is defined by the world gravity and the camera view direction. The proposed GV system is naturally gravity-aligned and uniquely defined for each video frame, largely reducing the ambiguity of learning image-pose mapping. The estimated poses can be transformed back to the world coordinate system using camera rotations, forming a global motion sequence. Additionally, the per-frame estimation avoids error accumulation in the autoregressive methods. Experiments on in-the-wild benchmarks demonstrate that our method recovers more realistic motion in both the camera space and world-grounded settings, outperforming state-of-the-art methods in both accuracy and speed. The code is available at https://zju3dv.github.io/gvhmr/.
Related papers
- COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation [98.05046790227561]
COIN is a control-inpainting motion diffusion prior that enables fine-grained control to disentangle human and camera motions.
COIN outperforms the state-of-the-art methods in terms of global human motion estimation and camera motion estimation.
arXiv Detail & Related papers (2024-08-29T10:36:29Z) - WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion [43.95997922499137]
WHAM (World-grounded Humans with Accurate Motion) reconstructs 3D human motion in a global coordinate system from video.
Uses camera angular velocity estimated from a SLAM method together with human motion to estimate the body's global trajectory.
outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks.
arXiv Detail & Related papers (2023-12-12T18:57:46Z) - W-HMR: Monocular Human Mesh Recovery in World Space with Weak-Supervised Calibration [57.37135310143126]
Previous methods for 3D motion recovery from monocular images often fall short due to reliance on camera coordinates.
We introduce W-HMR, a weak-supervised calibration method that predicts "reasonable" focal lengths based on body distortion information.
We also present the OrientCorrect module, which corrects body orientation for plausible reconstructions in world space.
arXiv Detail & Related papers (2023-11-29T09:02:07Z) - Decoupling Human and Camera Motion from Videos in the Wild [67.39432972193929]
We propose a method to reconstruct global human trajectories from videos in the wild.
Our method decouples the camera and human motion, which allows us to place people in the same world coordinate frame.
arXiv Detail & Related papers (2023-02-24T18:59:15Z) - GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras [99.07219478953982]
We present an approach for 3D global human mesh recovery from monocular videos recorded with dynamic cameras.
We first propose a deep generative motion infiller, which autoregressively infills the body motions of occluded humans based on visible motions.
In contrast to prior work, our approach reconstructs human meshes in consistent global coordinates even with dynamic cameras.
arXiv Detail & Related papers (2021-12-02T18:59:54Z) - Camera Motion Agnostic 3D Human Pose Estimation [8.090223360924004]
This paper presents a camera motion agnostic approach for predicting 3D human pose and mesh defined in the world coordinate system.
We propose a network based on bidirectional gated recurrent units (GRUs) that predicts the global motion sequence from the local pose sequence.
We use 3DPW and synthetic datasets, which are constructed in a moving-camera environment, for evaluation.
arXiv Detail & Related papers (2021-12-01T08:22:50Z) - Gravity-Aware Monocular 3D Human-Object Reconstruction [73.25185274561139]
This paper proposes a new approach for joint markerless 3D human motion capture and object trajectory estimation from monocular RGB videos.
We focus on scenes with objects partially observed during a free flight.
In the experiments, our approach achieves state-of-the-art accuracy in 3D human motion capture on various metrics.
arXiv Detail & Related papers (2021-08-19T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.