JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and
Tracking
- URL: http://arxiv.org/abs/2210.11940v1
- Date: Thu, 20 Oct 2022 07:14:37 GMT
- Title: JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and
Tracking
- Authors: Edward Vendrow, Duy Tho Le and Hamid Rezatofighi
- Abstract summary: We introduce JRDB-Pose, a large-scale dataset for multi-person pose estimation and tracking.
The dataset contains challenge scenes with crowded indoor and outdoor locations.
JRDB-Pose provides human pose annotations with per-keypoint occlusion labels and track IDs consistent across the scene.
- Score: 6.789370732159177
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous robotic systems operating in human environments must understand
their surroundings to make accurate and safe decisions. In crowded human scenes
with close-up human-robot interaction and robot navigation, a deep
understanding requires reasoning about human motion and body dynamics over time
with human body pose estimation and tracking. However, existing datasets either
do not provide pose annotations or include scene types unrelated to robotic
applications. Many datasets also lack the diversity of poses and occlusions
found in crowded human scenes. To address this limitation we introduce
JRDB-Pose, a large-scale dataset and benchmark for multi-person pose estimation
and tracking using videos captured from a social navigation robot. The dataset
contains challenge scenes with crowded indoor and outdoor locations and a
diverse range of scales and occlusion types. JRDB-Pose provides human pose
annotations with per-keypoint occlusion labels and track IDs consistent across
the scene. A public evaluation server is made available for fair evaluation on
a held-out test set. JRDB-Pose is available at https://jrdb.erc.monash.edu/ .
Related papers
- JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments [33.85323884177833]
JRDB-PanoTrack is a novel open-world panoptic segmentation and tracking benchmark for environment understanding in robot systems.
JRDB-PanoTrack includes (1) various data involving indoor and outdoor crowded scenes, as well as comprehensive 2D and 3D synchronized data modalities.
Various object classes for closed- and open-world recognition benchmarks, with OSPA-based metrics for evaluation.
arXiv Detail & Related papers (2024-04-02T06:43:22Z) - Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset [52.22758311559]
We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot.
The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors.
The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users.
arXiv Detail & Related papers (2024-03-21T14:53:50Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - Revisit Human-Scene Interaction via Space Occupancy [55.67657438543008]
Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks.
In this work, we argue that interaction with a scene is essentially interacting with the space occupancy of the scene from an abstract physical perspective.
By treating pure motion sequences as records of humans interacting with invisible scene occupancy, we can aggregate motion-only data into a large-scale paired human-occupancy interaction database.
arXiv Detail & Related papers (2023-12-05T12:03:00Z) - FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions [25.42369471193405]
FreeMan is the first large-scale, multi-view dataset collected under the real-world conditions.
It comprises 11M frames from 8000 sequences, viewed from different perspectives.
These sequences cover 40 subjects across 10 different scenarios, each with varying lighting conditions.
arXiv Detail & Related papers (2023-09-10T16:42:11Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z) - JRDB-Act: A Large-scale Multi-modal Dataset for Spatio-temporal Action,
Social Group and Activity Detection [54.696819174421584]
We introduce JRDB-Act, a multi-modal dataset that reflects a real distribution of human daily life actions in a university campus environment.
JRDB-Act has been densely annotated with atomic actions, comprises over 2.8M action labels.
JRDB-Act comes with social group identification annotations conducive to the task of grouping individuals based on their interactions in the scene.
arXiv Detail & Related papers (2021-06-16T14:43:46Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Few-Shot Visual Grounding for Natural Human-Robot Interaction [0.0]
We propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user.
At the core of our system, we employ a multi-modal deep neural network for visual grounding.
We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets.
arXiv Detail & Related papers (2021-03-17T15:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.