Synergistic Global-space Camera and Human Reconstruction from Videos
- URL: http://arxiv.org/abs/2405.14855v1
- Date: Thu, 23 May 2024 17:57:50 GMT
- Title: Synergistic Global-space Camera and Human Reconstruction from Videos
- Authors: Yizhou Zhao, Tuanfeng Y. Wang, Bhiksha Raj, Min Xu, Jimei Yang, Chun-Hao Paul Huang,
- Abstract summary: This work introduces Synergistic Camera and Human Reconstruction (SynSynR) to marry the best of both worlds.
Specifically, we design Human-aware Metric CHM to reconstruct metric-scale camera poses and scene point clouds.
We further learn a Scene-aware SMPL Denoiser to enhance world-frame HMR by incorporating dense-temporal coherency and dynamic scene constraints.
- Score: 41.309293977251855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remarkable strides have been made in reconstructing static scenes or human bodies from monocular videos. Yet, the two problems have largely been approached independently, without much synergy. Most visual SLAM methods can only reconstruct camera trajectories and scene structures up to scale, while most HMR methods reconstruct human meshes in metric scale but fall short in reasoning with cameras and scenes. This work introduces Synergistic Camera and Human Reconstruction (SynCHMR) to marry the best of both worlds. Specifically, we design Human-aware Metric SLAM to reconstruct metric-scale camera poses and scene point clouds using camera-frame HMR as a strong prior, addressing depth, scale, and dynamic ambiguities. Conditioning on the dense scene recovered, we further learn a Scene-aware SMPL Denoiser to enhance world-frame HMR by incorporating spatio-temporal coherency and dynamic scene constraints. Together, they lead to consistent reconstructions of camera trajectories, human meshes, and dense scene point clouds in a common world frame. Project page: https://paulchhuang.github.io/synchmr
Related papers
- Reconstructing People, Places, and Cameras [57.81696692335401]
"Humans and Structure from Motion" (HSfM) is a method for jointly reconstructing multiple human meshes, scene point clouds, and camera parameters in a metric world coordinate system.
Our results show that incorporating human data into the SfM pipeline improves camera pose estimation.
arXiv Detail & Related papers (2024-12-23T18:58:34Z) - Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera [49.82535393220003]
Dyn-HaMR is the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild.
We show that our approach significantly outperforms state-of-the-art methods in terms of 4D global mesh recovery.
This establishes a new benchmark for hand motion reconstruction from monocular video with moving cameras.
arXiv Detail & Related papers (2024-12-17T12:43:10Z) - UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos [25.41337525728398]
We introduce UniHOI, a model that unifies the estimation of all variables necessary for dense 4D reconstruction.
UniHOI is the first approach to offer fast, dense, and general monocular egocentric HOI scene reconstruction in the presence of motion.
arXiv Detail & Related papers (2024-11-14T02:57:11Z) - EgoLocate: Real-time Motion Capture, Localization, and Mapping with
Sparse Body-mounted Sensors [74.1275051763006]
We develop a system that simultaneously performs human motion capture (mocap), localization, and mapping in real time from sparse body-mounted sensors.
Our technique is largely improved by our technique, compared with the state of the art of the two fields.
arXiv Detail & Related papers (2023-05-02T16:56:53Z) - Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis [76.72505510632904]
We present Total-Recon, the first method to reconstruct deformable scenes from long monocular RGBD videos.
Our method hierarchically decomposes the scene into the background and objects, whose motion is decomposed into root-body motion and local articulations.
arXiv Detail & Related papers (2023-04-24T17:59:52Z) - Decoupling Human and Camera Motion from Videos in the Wild [67.39432972193929]
We propose a method to reconstruct global human trajectories from videos in the wild.
Our method decouples the camera and human motion, which allows us to place people in the same world coordinate frame.
arXiv Detail & Related papers (2023-02-24T18:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.