Embodied Scene-aware Human Pose Estimation
- URL: http://arxiv.org/abs/2206.09106v1
- Date: Sat, 18 Jun 2022 03:50:19 GMT
- Title: Embodied Scene-aware Human Pose Estimation
- Authors: Zhengyi Luo, Shun Iwase, Ye Yuan, Kris Kitani
- Abstract summary: We propose embodied scene-aware human pose estimation.
Our method is one stage, causal, and recovers global 3D human poses in a simulated environment.
- Score: 25.094152307452
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose embodied scene-aware human pose estimation where we estimate 3D
poses based on a simulated agent's proprioception and scene awareness, along
with external third-person observations. Unlike prior methods that often resort
to multistage optimization, non-causal inference, and complex contact modeling
to estimate human pose and human scene interactions, our method is one stage,
causal, and recovers global 3D human poses in a simulated environment. Since 2D
third-person observations are coupled with the camera pose, we propose to
disentangle the camera pose and use a multi-step projection gradient defined in
the global coordinate frame as the movement cue for our embodied agent.
Leveraging a physics simulation and prescanned scenes (e.g., 3D mesh), we
simulate our agent in everyday environments (libraries, offices, bedrooms,
etc.) and equip our agent with environmental sensors to intelligently navigate
and interact with scene geometries. Our method also relies only on 2D keypoints
and can be trained on synthetic datasets derived from popular human motion
databases. To evaluate, we use the popular H36M and PROX datasets and, for the
first time, achieve a success rate of 96.7% on the challenging PROX dataset
without ever using PROX motion sequences for training.
Related papers
- WHAC: World-grounded Humans and Cameras [37.877565981937586]
We aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly.
We introduce a novel framework, referred to as WHAC, to facilitate world-grounded expressive human pose and shape estimation.
We present a new synthetic dataset, WHAC-A-Mole, which includes accurately annotated humans and cameras.
arXiv Detail & Related papers (2024-03-19T17:58:02Z) - Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers [28.38686299271394]
We propose a framework for 3D sequence-to-sequence (seq2seq) human pose detection.
Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships.
Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset.
arXiv Detail & Related papers (2024-01-30T03:00:25Z) - Human Pose Estimation in Monocular Omnidirectional Top-View Images [3.07869141026886]
We propose a new dataset for training and evaluation of CNNs for the task of keypoint detection in omnidirectional images.
The training dataset, THEODORE+, consists of 50,000 images and is created by a 3D rendering engine.
For evaluation purposes, the real-world PoseFES dataset with two scenarios and 701 frames with up to eight persons per scene was captured and annotated.
arXiv Detail & Related papers (2023-04-17T11:52:04Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - Scene-aware Egocentric 3D Human Pose Estimation [72.57527706631964]
Egocentric 3D human pose estimation with a single head-mounted fisheye camera has recently attracted attention due to its numerous applications in virtual and augmented reality.
Existing methods still struggle in challenging poses where the human body is highly occluded or is closely interacting with the scene.
We propose a scene-aware egocentric pose estimation method that guides the prediction of the egocentric pose with scene constraints.
arXiv Detail & Related papers (2022-12-20T21:35:39Z) - HSPACE: Synthetic Parametric Humans Animated in Complex Environments [67.8628917474705]
We build a large-scale photo-realistic dataset, Human-SPACE, of animated humans placed in complex indoor and outdoor environments.
We combine a hundred diverse individuals of varying ages, gender, proportions, and ethnicity, with hundreds of motions and scenes, in order to generate an initial dataset of over 1 million frames.
Assets are generated automatically, at scale, and are compatible with existing real time rendering and game engines.
arXiv Detail & Related papers (2021-12-23T22:27:55Z) - Estimating 3D Motion and Forces of Human-Object Interactions from
Internet Videos [49.52070710518688]
We introduce a method to reconstruct the 3D motion of a person interacting with an object from a single RGB video.
Our method estimates the 3D poses of the person together with the object pose, the contact positions and the contact forces on the human body.
arXiv Detail & Related papers (2021-11-02T13:40:18Z) - Human POSEitioning System (HPS): 3D Human Pose Estimation and
Self-localization in Large Scenes from Body-Mounted Sensors [71.29186299435423]
We introduce (HPS) Human POSEitioning System, a method to recover the full 3D pose of a human registered with a 3D scan of the surrounding environment.
We show that our optimization-based integration exploits the benefits of the two, resulting in pose accuracy free of drift.
HPS could be used for VR/AR applications where humans interact with the scene without requiring direct line of sight with an external camera.
arXiv Detail & Related papers (2021-03-31T17:58:31Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.