Hierarchical and Multimodal Data for Daily Activity Understanding
- URL: http://arxiv.org/abs/2504.17696v2
- Date: Fri, 25 Apr 2025 16:07:50 GMT
- Title: Hierarchical and Multimodal Data for Daily Activity Understanding
- Authors: Ghazal Kaviani, Yavuz Yarici, Seulgi Kim, Mohit Prabhushankar, Ghassan AlRegib, Mashhour Solh, Ameya Patil,
- Abstract summary: Daily Activity Recordings for Artificial Intelligence (DARai) is a multimodal dataset constructed to understand human activities in real-world settings.<n>DARai consists of continuous scripted and unscripted recordings of 50 participants in 10 different environments, totaling over 200 hours of data.<n> Experiments with various machine learning models showcase the value of DARai in uncovering important challenges in human-centered applications.
- Score: 11.200514097148776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Daily Activity Recordings for Artificial Intelligence (DARai, pronounced "Dahr-ree") is a multimodal, hierarchically annotated dataset constructed to understand human activities in real-world settings. DARai consists of continuous scripted and unscripted recordings of 50 participants in 10 different environments, totaling over 200 hours of data from 20 sensors including multiple camera views, depth and radar sensors, wearable inertial measurement units (IMUs), electromyography (EMG), insole pressure sensors, biomonitor sensors, and gaze tracker. To capture the complexity in human activities, DARai is annotated at three levels of hierarchy: (i) high-level activities (L1) that are independent tasks, (ii) lower-level actions (L2) that are patterns shared between activities, and (iii) fine-grained procedures (L3) that detail the exact execution steps for actions. The dataset annotations and recordings are designed so that 22.7% of L2 actions are shared between L1 activities and 14.2% of L3 procedures are shared between L2 actions. The overlap and unscripted nature of DARai allows counterfactual activities in the dataset. Experiments with various machine learning models showcase the value of DARai in uncovering important challenges in human-centered applications. Specifically, we conduct unimodal and multimodal sensor fusion experiments for recognition, temporal localization, and future action anticipation across all hierarchical annotation levels. To highlight the limitations of individual sensors, we also conduct domain-variant experiments that are enabled by DARai's multi-sensor and counterfactual activity design setup. The code, documentation, and dataset are available at the dedicated DARai website: https://alregib.ece.gatech.edu/software-and-datasets/darai-daily-activity-recordings-for-artificial- intelligence-and-machine-learning/
Related papers
- DISCOVER: Data-driven Identification of Sub-activities via Clustering and Visualization for Enhanced Activity Recognition in Smart Homes [52.09869569068291]
We introduce DISCOVER, a method to discover fine-grained human sub-activities from unlabeled sensor data without relying on pre-segmentation.
We demonstrate its effectiveness through a re-annotation exercise on widely used HAR datasets.
arXiv Detail & Related papers (2025-02-11T20:02:24Z) - Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild [66.34146236875822]
The Nymeria dataset is a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices.
It contains 1200 recordings of 300 hours of daily activities from 264 participants across 50 locations, travelling a total of 399Km.
The motion-language descriptions provide 310.5K sentences in 8.64M words from a vocabulary size of 6545.
arXiv Detail & Related papers (2024-06-14T10:23:53Z) - MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in
3D World [55.878173953175356]
We propose MultiPLY, a multisensory embodied large language model.
We first collect Multisensory Universe, a large-scale multisensory interaction dataset comprising 500k data.
We demonstrate that MultiPLY outperforms baselines by a large margin through a diverse set of embodied tasks.
arXiv Detail & Related papers (2024-01-16T18:59:45Z) - SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks.
We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction.
Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z) - Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation
for autonomous vehicles [63.20765930558542]
3D semantic data are useful for core perception tasks such as obstacle detection and ego-vehicle localization.
We propose a new dataset, Navya 3D (Navya3DSeg), with a diverse label space corresponding to a large scale production grade operational domain.
It contains 23 labeled sequences and 25 supplementary sequences without labels, designed to explore self-supervised and semi-supervised semantic segmentation benchmarks on point clouds.
arXiv Detail & Related papers (2023-02-16T13:41:19Z) - Unsupervised Deep Learning-based clustering for Human Activity
Recognition [8.716606664673982]
The paper proposes DISC (Deep Inertial Sensory Clustering), a DL-based clustering architecture that automatically labels multi-dimensional inertial signals.
The architecture combines a recurrent AutoEncoder and a clustering criterion to predict unlabelled human activities-related signals.
The experiments demonstrate the effectiveness of DISC on both clustering accuracy and normalized mutual information metrics.
arXiv Detail & Related papers (2022-11-10T10:56:47Z) - UMSNet: An Universal Multi-sensor Network for Human Activity Recognition [10.952666953066542]
This paper proposes a universal multi-sensor network (UMSNet) for human activity recognition.
In particular, we propose a new lightweight sensor residual block (called LSR block), which improves the performance.
Our framework has a clear structure and can be directly applied to various types of multi-modal Time Series Classification tasks.
arXiv Detail & Related papers (2022-05-24T03:29:54Z) - CZU-MHAD: A multimodal dataset for human action recognition utilizing a
depth camera and 10 wearable inertial sensors [1.0742675209112622]
CZU-MHAD (Changzhou University: a comprehensive multi-modal human action dataset) consists of 22 actions and three modals temporal synchronized data.
These modals include depth videos and skeleton positions from a kinect v2 camera, and inertial signals from 10 wearable sensors.
arXiv Detail & Related papers (2022-02-07T15:17:08Z) - Using Language Model to Bootstrap Human Activity Recognition Ambient
Sensors Based in Smart Homes [2.336163487623381]
We propose two Natural Language Processing embedding methods to enhance LSTM-based structures in activity-sequences classification tasks.
Results indicate that this approach provides useful information, such as a sensor organization map.
Our tests show that the embeddings can be pretrained on different datasets than the target one, enabling transfer learning.
arXiv Detail & Related papers (2021-11-23T21:21:14Z) - IntentNet: Learning to Predict Intention from Raw Sensor Data [86.74403297781039]
In this paper, we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment.
Our multi-task model achieves better accuracy than the respective separate modules while saving computation, which is critical to reducing reaction time in self-driving applications.
arXiv Detail & Related papers (2021-01-20T00:31:52Z) - DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention
and Alertness Analysis [54.198237164152786]
Vision is the richest and most cost-effective technology for Driver Monitoring Systems (DMS)
The lack of sufficiently large and comprehensive datasets is currently a bottleneck for the progress of DMS development.
In this paper, we introduce the Driver Monitoring dataset (DMD), an extensive dataset which includes real and simulated driving scenarios.
arXiv Detail & Related papers (2020-08-27T12:33:54Z) - Sequential Weakly Labeled Multi-Activity Localization and Recognition on
Wearable Sensors using Recurrent Attention Networks [13.64024154785943]
We propose a recurrent attention network (RAN) to handle sequential weakly labeled multi-activity recognition and location tasks.
Our RAN model can simultaneously infer multi-activity types from the coarse-grained sequential weak labels.
It will greatly reduce the burden of manual labeling.
arXiv Detail & Related papers (2020-04-13T04:57:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.