Human POSEitioning System (HPS): 3D Human Pose Estimation and
Self-localization in Large Scenes from Body-Mounted Sensors
- URL: http://arxiv.org/abs/2103.17265v1
- Date: Wed, 31 Mar 2021 17:58:31 GMT
- Title: Human POSEitioning System (HPS): 3D Human Pose Estimation and
Self-localization in Large Scenes from Body-Mounted Sensors
- Authors: Vladimir Guzov, Aymen Mir, Torsten Sattler, Gerard Pons-Moll
- Abstract summary: We introduce (HPS) Human POSEitioning System, a method to recover the full 3D pose of a human registered with a 3D scan of the surrounding environment.
We show that our optimization-based integration exploits the benefits of the two, resulting in pose accuracy free of drift.
HPS could be used for VR/AR applications where humans interact with the scene without requiring direct line of sight with an external camera.
- Score: 71.29186299435423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce (HPS) Human POSEitioning System, a method to recover the full 3D
pose of a human registered with a 3D scan of the surrounding environment using
wearable sensors. Using IMUs attached at the body limbs and a head mounted
camera looking outwards, HPS fuses camera based self-localization with
IMU-based human body tracking. The former provides drift-free but noisy
position and orientation estimates while the latter is accurate in the
short-term but subject to drift over longer periods of time. We show that our
optimization-based integration exploits the benefits of the two, resulting in
pose accuracy free of drift. Furthermore, we integrate 3D scene constraints
into our optimization, such as foot contact with the ground, resulting in
physically plausible motion. HPS complements more common third-person-based 3D
pose estimation methods. It allows capturing larger recording volumes and
longer periods of motion, and could be used for VR/AR applications where humans
interact with the scene without requiring direct line of sight with an external
camera, or to train agents that navigate and interact with the environment
based on first-person visual input, like real humans. With HPS, we recorded a
dataset of humans interacting with large 3D scenes (300-1000 sq.m) consisting
of 7 subjects and more than 3 hours of diverse motion. The dataset, code and
video will be available on the project page:
http://virtualhumans.mpi-inf.mpg.de/hps/ .
Related papers
- Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset [52.22758311559]
We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot.
The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors.
The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users.
arXiv Detail & Related papers (2024-03-21T14:53:50Z) - WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion [43.95997922499137]
WHAM (World-grounded Humans with Accurate Motion) reconstructs 3D human motion in a global coordinate system from video.
Uses camera angular velocity estimated from a SLAM method together with human motion to estimate the body's global trajectory.
outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks.
arXiv Detail & Related papers (2023-12-12T18:57:46Z) - TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D
Environments [106.80978555346958]
Current methods can't reliably estimate moving humans in global coordinates.
TRACE is the first one-stage method to jointly recover and track 3D humans in global coordinates from dynamic cameras.
It achieves state-of-the-art performance on tracking and HPS benchmarks.
arXiv Detail & Related papers (2023-06-05T13:00:44Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - Embodied Scene-aware Human Pose Estimation [25.094152307452]
We propose embodied scene-aware human pose estimation.
Our method is one stage, causal, and recovers global 3D human poses in a simulated environment.
arXiv Detail & Related papers (2022-06-18T03:50:19Z) - Human-Aware Object Placement for Visual Environment Reconstruction [63.14733166375534]
We show that human-scene interactions can be leveraged to improve the 3D reconstruction of a scene from a monocular RGB video.
Our key idea is that, as a person moves through a scene and interacts with it, we accumulate HSIs across multiple input images.
We show that our scene reconstruction can be used to refine the initial 3D human pose and shape estimation.
arXiv Detail & Related papers (2022-03-07T18:59:02Z) - Learning Motion Priors for 4D Human Body Capture in 3D Scenes [81.54377747405812]
We propose LEMO: LEarning human MOtion priors for 4D human body capture.
We introduce a novel motion prior, which reduces the jitters exhibited by poses recovered over a sequence.
We also design a contact friction term and a contact-aware motion infiller obtained via per-instance self-supervised training.
With our pipeline, we demonstrate high-quality 4D human body capture, reconstructing smooth motions and physically plausible body-scene interactions.
arXiv Detail & Related papers (2021-08-23T20:47:09Z) - AnimePose: Multi-person 3D pose estimation and animation [9.323689681059504]
3D animation of humans in action is quite challenging as it involves using a huge setup with several motion trackers all over the person's body to track the movements of every limb.
This is time-consuming and may cause the person discomfort in wearing exoskeleton body suits with motion sensors.
We present a solution to generate 3D animation of multiple persons from a 2D video using deep learning.
arXiv Detail & Related papers (2020-02-06T11:11:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.