SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in
Urban Environments
- URL: http://arxiv.org/abs/2303.09095v2
- Date: Sat, 18 Mar 2023 13:44:08 GMT
- Title: SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in
Urban Environments
- Authors: Yudi Dai (1), Yitai Lin (1), Xiping Lin (2), Chenglu Wen (1), Lan Xu
(2), Hongwei Yi (3), Siqi Shen (1), Yuexin Ma (2), Cheng Wang (1) ((1) Xiamen
University, China, (2) ShanghaiTech University, China, (3) Max Planck
Institute for Intelligent Systems, Germany)
- Abstract summary: We present SLOPER4D, a novel scene-aware dataset collected in large urban environments.
We record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view.
SLOPER4D consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present SLOPER4D, a novel scene-aware dataset collected in large urban
environments to facilitate the research of global human pose estimation (GHPE)
with human-scene interaction in the wild. Employing a head-mounted device
integrated with a LiDAR and camera, we record 12 human subjects' activities
over 10 diverse urban scenes from an egocentric view. Frame-wise annotations
for 2D key points, 3D pose parameters, and global translations are provided,
together with reconstructed scene point clouds. To obtain accurate 3D ground
truth in such large dynamic scenes, we propose a joint optimization method to
fit local SMPL meshes to the scene and fine-tune the camera calibration during
dynamic motions frame by frame, resulting in plausible and scene-natural 3D
human poses. Eventually, SLOPER4D consists of 15 sequences of human motions,
each of which has a trajectory length of more than 200 meters (up to 1,300
meters) and covers an area of more than 2,000 $m^2$ (up to 13,000 $m^2$),
including more than 100K LiDAR frames, 300k video frames, and 500K IMU-based
motion frames. With SLOPER4D, we provide a detailed and thorough analysis of
two critical tasks, including camera-based 3D HPE and LiDAR-based 3D HPE in
urban environments, and benchmark a new task, GHPE. The in-depth analysis
demonstrates SLOPER4D poses significant challenges to existing methods and
produces great research opportunities. The dataset and code are released at
\url{http://www.lidarhumanmotion.net/sloper4d/}
Related papers
- HiSC4D: Human-centered interaction and 4D Scene Capture in Large-scale Space Using Wearable IMUs and LiDAR [43.43745311617461]
We introduce HiSC4D, a novel Human-centered interaction and 4D Scene Capture method.
By utilizing body-mounted IMUs and a head-mounted LiDAR, HiSC4D can capture egocentric human motions in unconstrained space.
We present a dataset, containing 8 sequences in 4 large scenes (200 to 5,000 $m2$), providing 36k frames of accurate 4D human motions.
arXiv Detail & Related papers (2024-09-06T16:43:04Z) - Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene
Scale and Realism Tradeoffs for ObjectGoal Navigation [70.82403156865057]
We investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find and navigate to objects.
Our experiments show that agents trained on our smaller-scale dataset can match or outperform agents trained on much larger datasets.
arXiv Detail & Related papers (2023-06-20T05:07:23Z) - TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D
Environments [106.80978555346958]
Current methods can't reliably estimate moving humans in global coordinates.
TRACE is the first one-stage method to jointly recover and track 3D humans in global coordinates from dynamic cameras.
It achieves state-of-the-art performance on tracking and HPS benchmarks.
arXiv Detail & Related papers (2023-06-05T13:00:44Z) - CIRCLE: Capture In Rich Contextual Environments [69.97976304918149]
We propose a novel motion acquisition system in which the actor perceives and operates in a highly contextual virtual world.
We present CIRCLE, a dataset containing 10 hours of full-body reaching motion from 5 subjects across nine scenes.
We use this dataset to train a model that generates human motion conditioned on scene information.
arXiv Detail & Related papers (2023-03-31T09:18:12Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - Embodied Scene-aware Human Pose Estimation [25.094152307452]
We propose embodied scene-aware human pose estimation.
Our method is one stage, causal, and recovers global 3D human poses in a simulated environment.
arXiv Detail & Related papers (2022-06-18T03:50:19Z) - HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor
Space Using Wearable IMUs and LiDAR [51.9200422793806]
Using only body-mounted IMUs and LiDAR, HSC4D is space-free without any external devices' constraints and map-free without pre-built maps.
Relationships between humans and environments are also explored to make their interaction more realistic.
arXiv Detail & Related papers (2022-03-17T10:05:55Z) - Human POSEitioning System (HPS): 3D Human Pose Estimation and
Self-localization in Large Scenes from Body-Mounted Sensors [71.29186299435423]
We introduce (HPS) Human POSEitioning System, a method to recover the full 3D pose of a human registered with a 3D scan of the surrounding environment.
We show that our optimization-based integration exploits the benefits of the two, resulting in pose accuracy free of drift.
HPS could be used for VR/AR applications where humans interact with the scene without requiring direct line of sight with an external camera.
arXiv Detail & Related papers (2021-03-31T17:58:31Z) - Synergetic Reconstruction from 2D Pose and 3D Motion for Wide-Space
Multi-Person Video Motion Capture in the Wild [3.0015034534260665]
We propose a markerless motion capture method with accuracy and smoothness from multiple cameras.
The proposed method predicts each persons 3D pose and determines bounding box of multi-camera images.
We evaluated the proposed method using various datasets and a real sports field.
arXiv Detail & Related papers (2020-01-16T02:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.