Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset
- URL: http://arxiv.org/abs/2403.14447v2
- Date: Sat, 23 Mar 2024 11:26:38 GMT
- Title: Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset
- Authors: Andrea Avogaro, Andrea Toaiari, Federico Cunico, Xiangmin Xu, Haralambos Dafas, Alessandro Vinciarelli, Emma Li, Marco Cristani,
- Abstract summary: We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot.
The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors.
The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users.
- Score: 52.22758311559
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users. The Corpus contains not only the recordings of the built-in stereo cameras of Spot, but also those of a 6-camera OptiTrack system (all recordings are synchronized). This leads to ground-truth skeletal representations with a precision lower than a millimeter. In addition, the Corpus includes reproducible benchmarks on 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction, all based on publicly available baseline approaches. This enables future HARPER users to rigorously compare their results with those we provide in this work.
Related papers
- Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction [51.49400490437258]
This work develops a method for imitating articulated object manipulation from a single monocular RGB human demonstration.
We first propose 4D Differentiable Part Models (4D-DPM), a method for recovering 3D part motion from a monocular video.
Given this 4D reconstruction, the robot replicates object trajectories by planning bimanual arm motions that induce the demonstrated object part motion.
We evaluate 4D-DPM's 3D tracking accuracy on ground truth annotated 3D part trajectories and RSRD's physical execution performance on 9 objects across 10 trials each on a bimanual YuMi robot.
arXiv Detail & Related papers (2024-09-26T17:57:16Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - Robots That Can See: Leveraging Human Pose for Trajectory Prediction [30.919756497223343]
We present a Transformer based architecture to predict human future trajectories in human-centric environments.
The resulting model captures the inherent uncertainty for future human trajectory prediction.
We identify new agents with limited historical data as a major contributor to error and demonstrate the complementary nature of 3D skeletal poses in reducing prediction error.
arXiv Detail & Related papers (2023-09-29T13:02:56Z) - External Camera-based Mobile Robot Pose Estimation for Collaborative
Perception with Smart Edge Sensors [22.5939915003931]
We present an approach for estimating a mobile robot's pose w.r.t. the allocentric coordinates of a network of static cameras using multi-view RGB images.
The images are processed online, locally on smart edge sensors by deep neural networks to detect the robot.
With the robot's pose precisely estimated, its observations can be fused into the allocentric scene model.
arXiv Detail & Related papers (2023-03-07T11:03:33Z) - Human keypoint detection for close proximity human-robot interaction [29.99153271571971]
We study the performance of state-of-the-art human keypoint detectors in the context of close proximity human-robot interaction.
The best performing whole-body keypoint detectors in close proximity were MMPose and AlphaPose, but both had difficulty with finger detection.
We propose a combination of MMPose or AlphaPose for the body and MediaPipe for the hands in a single framework providing the most accurate and robust detection.
arXiv Detail & Related papers (2022-07-15T20:33:29Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z) - Single-view robot pose and joint angle estimation via render & compare [40.05546237998603]
We introduce RoboPose, a method to estimate the joint angles and the 6D camera-to-robot pose of a known articulated robot from a single RGB image.
This is an important problem to grant mobile and itinerant autonomous systems the ability to interact with other robots.
arXiv Detail & Related papers (2021-04-19T14:48:29Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Human POSEitioning System (HPS): 3D Human Pose Estimation and
Self-localization in Large Scenes from Body-Mounted Sensors [71.29186299435423]
We introduce (HPS) Human POSEitioning System, a method to recover the full 3D pose of a human registered with a 3D scan of the surrounding environment.
We show that our optimization-based integration exploits the benefits of the two, resulting in pose accuracy free of drift.
HPS could be used for VR/AR applications where humans interact with the scene without requiring direct line of sight with an external camera.
arXiv Detail & Related papers (2021-03-31T17:58:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.