ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily
Living
- URL: http://arxiv.org/abs/2402.17758v1
- Date: Tue, 27 Feb 2024 18:51:52 GMT
- Title: ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily
Living
- Authors: Marsil Zakour, Partha Pratim Nath, Ludwig Lohmer, Emre Faik
G\"ok\c{c}e, Martin Piccolrovazzi, Constantin Patsch, Yuankai Wu, Rahul
Chaudhari, Eckehard Steinbach
- Abstract summary: ADL4D is a dataset of up to two subjects inter- acting with different sets of objects performing Activities of Daily Living (ADL)
Our dataset consists of 75 sequences with a total of 1.1M RGB-D frames, hand and object poses, and per-hand fine-grained action annotations.
We develop an automatic system for multi-view multi-hand 3D pose an- notation capable of tracking hand poses over time.
- Score: 4.221961702292134
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hand-Object Interactions (HOIs) are conditioned on spatial and temporal
contexts like surrounding objects, pre- vious actions, and future intents (for
example, grasping and handover actions vary greatly based on objects proximity
and trajectory obstruction). However, existing datasets for 4D HOI (3D HOI over
time) are limited to one subject inter- acting with one object only. This
restricts the generalization of learning-based HOI methods trained on those
datasets. We introduce ADL4D, a dataset of up to two subjects inter- acting
with different sets of objects performing Activities of Daily Living (ADL) like
breakfast or lunch preparation ac- tivities. The transition between multiple
objects to complete a certain task over time introduces a unique context
lacking in existing datasets. Our dataset consists of 75 sequences with a total
of 1.1M RGB-D frames, hand and object poses, and per-hand fine-grained action
annotations. We develop an automatic system for multi-view multi-hand 3D pose
an- notation capable of tracking hand poses over time. We inte- grate and test
it against publicly available datasets. Finally, we evaluate our dataset on the
tasks of Hand Mesh Recov- ery (HMR) and Hand Action Segmentation (HAS).
Related papers
- Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics [43.30868393851785]
HOGraspNet is a training dataset for 3D hand-object interaction.
The dataset includes diverse hand shapes from 99 participants aged 10 to 74.
It offers labels for 3D hand and object meshes, 3D keypoints, contact maps, and emphgrasp labels
arXiv Detail & Related papers (2024-09-06T05:49:38Z) - HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects [86.86284624825356]
HIMO is a dataset of full-body human interacting with multiple objects.
HIMO contains 3.3K 4D HOI sequences and 4.08M 3D HOI frames.
arXiv Detail & Related papers (2024-07-17T07:47:34Z) - ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative
Modeling of Human-Object Interactions [11.32229757116179]
We introduce the ParaHome system, designed to capture dynamic 3D movements of humans and objects within a common home environment.
By leveraging the ParaHome system, we collect a novel large-scale dataset of human-object interaction.
arXiv Detail & Related papers (2024-01-18T18:59:58Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - ATTACH Dataset: Annotated Two-Handed Assembly Actions for Human Action
Understanding [8.923830513183882]
We present the ATTACH dataset, which contains 51.6 hours of assembly with 95.2k annotated fine-grained actions monitored by three cameras.
In the ATTACH dataset, more than 68% of annotations overlap with other annotations, which is many times more than in related datasets.
We report the performance of state-of-the-art methods for action recognition as well as action detection on video and skeleton-sequence inputs.
arXiv Detail & Related papers (2023-04-17T12:31:24Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z) - Semi-automatic 3D Object Keypoint Annotation and Detection for the
Masses [42.34064154798376]
We present a semi-automatic way of collecting and labeling datasets using a wrist mounted camera on a standard robotic arm.
We are able to obtain a working 3D object keypoint detector and go through the whole process of data collection, annotation and learning in just a couple hours of active time.
arXiv Detail & Related papers (2022-01-19T15:41:54Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z) - The IKEA ASM Dataset: Understanding People Assembling Furniture through
Actions, Objects and Pose [108.21037046507483]
IKEA ASM is a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose.
We benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset.
The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
arXiv Detail & Related papers (2020-07-01T11:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.