JRDB-Act: A Large-scale Multi-modal Dataset for Spatio-temporal Action,
Social Group and Activity Detection
- URL: http://arxiv.org/abs/2106.08827v1
- Date: Wed, 16 Jun 2021 14:43:46 GMT
- Title: JRDB-Act: A Large-scale Multi-modal Dataset for Spatio-temporal Action,
Social Group and Activity Detection
- Authors: Mahsa Ehsanpour, Fatemeh Saleh, Silvio Savarese, Ian Reid, Hamid
Rezatofighi
- Abstract summary: We introduce JRDB-Act, a multi-modal dataset that reflects a real distribution of human daily life actions in a university campus environment.
JRDB-Act has been densely annotated with atomic actions, comprises over 2.8M action labels.
JRDB-Act comes with social group identification annotations conducive to the task of grouping individuals based on their interactions in the scene.
- Score: 54.696819174421584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The availability of large-scale video action understanding datasets has
facilitated advances in the interpretation of visual scenes containing people.
However, learning to recognize human activities in an unconstrained real-world
environment, with potentially highly unbalanced and long-tailed distributed
data remains a significant challenge, not least owing to the lack of a
reflective large-scale dataset. Most existing large-scale datasets are either
collected from a specific or constrained environment, e.g. kitchens or rooms,
or video sharing platforms such as YouTube. In this paper, we introduce
JRDB-Act, a multi-modal dataset, as an extension of the existing JRDB, which is
captured by asocial mobile manipulator and reflects a real distribution of
human daily life actions in a university campus environment. JRDB-Act has been
densely annotated with atomic actions, comprises over 2.8M action labels,
constituting a large-scale spatio-temporal action detection dataset. Each human
bounding box is labelled with one pose-based action label and multiple
(optional) interaction-based action labels. Moreover JRDB-Act comes with social
group identification annotations conducive to the task of grouping individuals
based on their interactions in the scene to infer their social activities
(common activities in each social group).
Related papers
- JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups [8.415759777703125]
JRDB-Social fills gaps in human understanding across diverse indoor and outdoor social contexts.
This dataset aims to enhance our grasp of human social dynamics for robotic applications.
arXiv Detail & Related papers (2024-04-06T00:33:39Z) - Human-centric Scene Understanding for 3D Large-scale Scenarios [52.12727427303162]
We present a large-scale multi-modal dataset for human-centric scene understanding, dubbed HuCenLife.
Our HuCenLife can benefit many 3D perception tasks, such as segmentation, detection, action recognition, etc.
arXiv Detail & Related papers (2023-07-26T08:40:46Z) - Multi-Environment Pretraining Enables Transfer to Action Limited
Datasets [129.24823721649028]
In reinforcement learning, available data of decision making is often not annotated with actions.
We propose combining large but sparsely-annotated datasets from a emphtarget environment of interest with fully-annotated datasets from various other emphsource environments.
We show that utilizing even one additional environment dataset of sequential labelled data during IDM pretraining gives rise to substantial improvements in generating action labels for unannotated sequences.
arXiv Detail & Related papers (2022-11-23T22:48:22Z) - JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and
Tracking [6.789370732159177]
We introduce JRDB-Pose, a large-scale dataset for multi-person pose estimation and tracking.
The dataset contains challenge scenes with crowded indoor and outdoor locations.
JRDB-Pose provides human pose annotations with per-keypoint occlusion labels and track IDs consistent across the scene.
arXiv Detail & Related papers (2022-10-20T07:14:37Z) - Towards Rich, Portable, and Large-Scale Pedestrian Data Collection [6.250018240133604]
We propose a data collection system that is portable, which facilitates accessible large-scale data collection in diverse environments.
We introduce the first batch of dataset from the ongoing data collection effort -- the TBD pedestrian dataset.
Compared with existing pedestrian datasets, our dataset contains three components: human verified labels grounded in the metric space, a combination of top-down and perspective views, and naturalistic human behavior in the presence of a socially appropriate "robot"
arXiv Detail & Related papers (2022-03-03T19:28:10Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - The IKEA ASM Dataset: Understanding People Assembling Furniture through
Actions, Objects and Pose [108.21037046507483]
IKEA ASM is a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose.
We benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset.
The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
arXiv Detail & Related papers (2020-07-01T11:34:46Z) - Human in Events: A Large-Scale Benchmark for Human-centric Video
Analysis in Complex Events [106.19047816743988]
We present a new large-scale dataset with comprehensive annotations, named Human-in-Events or HiEve.
It contains a record number of poses (>1M), the largest number of action instances (>56k) under complex events, as well as one of the largest numbers of trajectories lasting for longer time.
Based on its diverse annotation, we present two simple baselines for action recognition and pose estimation.
arXiv Detail & Related papers (2020-05-09T18:24:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.