Recognition of Daily Activities through Multi-Modal Deep Learning: A Video, Pose, and Object-Aware Approach for Ambient Assisted Living
- URL: http://arxiv.org/abs/2603.04509v1
- Date: Wed, 04 Mar 2026 19:00:34 GMT
- Title: Recognition of Daily Activities through Multi-Modal Deep Learning: A Video, Pose, and Object-Aware Approach for Ambient Assisted Living
- Authors: Kooshan Hashemifard, Pau Climent-Pérez, Francisco Florez-Revuelta,
- Abstract summary: This paper presents a multi-modal approach for the recognition of activities of daily living tailored for older adults within AAL settings.<n>The proposed system integrates visual information processed by a 3D Convolutional Neural Network (CNN) with 3D human pose data analyzed by a Graph Convolutional Network.<n>The results indicate that the proposed system achieves competitive classification accuracy for a range of daily activities.
- Score: 5.0149699000056644
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recognition of daily activities is a critical element for effective Ambient Assisted Living (AAL) systems, particularly to monitor the well-being and support the independence of older adults in indoor environments. However, developing robust activity recognition systems faces significant challenges, including intra-class variability, inter-class similarity, environmental variability, camera perspectives, and scene complexity. This paper presents a multi-modal approach for the recognition of activities of daily living tailored for older adults within AAL settings. The proposed system integrates visual information processed by a 3D Convolutional Neural Network (CNN) with 3D human pose data analyzed by a Graph Convolutional Network. Contextual information, derived from an object detection module, is fused with the 3D CNN features using a cross-attention mechanism to enhance recognition accuracy. This method is evaluated using the Toyota SmartHome dataset, which consists of real-world indoor activities. The results indicate that the proposed system achieves competitive classification accuracy for a range of daily activities, highlighting its potential as an essential component for advanced AAL monitoring solutions. This advancement supports the broader goal of developing intelligent systems that promote safety and autonomy among older adults.
Related papers
- A Study on Real-time Object Detection using Deep Learning [0.0]
This article goes into great detail on how deep learning algorithms are used to enhance real time object recognition.<n>It provides information on the different object detection models available, open benchmark datasets, and studies on the use of object detection models in a range of applications.
arXiv Detail & Related papers (2026-02-17T18:12:42Z) - Integrating Temporal Context into Streaming Data for Human Activity Recognition in Smart Home [3.1032184155196982]
Human Activity Recognition (HAR) from passive sensors mostly relies on traditional machine learning.<n>We tackle this by clustering activities into morning, afternoon, and night.<n>We propose to extend the feature vector by incorporating time of day and day of week as cyclical temporal features.
arXiv Detail & Related papers (2026-01-09T09:47:06Z) - PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments [36.84821207878773]
Visual reasoning in multimodal large language models (MLLMs) has primarily been studied in static, fully observable settings.<n>We introduce the Active Visual Reasoning (AVR) task, extending visual reasoning to partially observable, interactive environments.<n>We present a benchmark featuring multi-round interactive environments designed to assess both reasoning and information-gathering efficiency.
arXiv Detail & Related papers (2025-10-24T02:59:00Z) - Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO [63.140883026848286]
Active vision refers to the process of actively selecting where and how to look in order to gather task-relevant information.<n>Recently, the use of Multimodal Large Language Models (MLLMs) as central planning and decision-making modules in robotic systems has gained extensive attention.
arXiv Detail & Related papers (2025-05-27T17:29:31Z) - Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook [19.539295469044813]
This study emphasizes the importance of robustness, alongside accuracy and latency, in evaluating perception systems under practical scenarios.
Our work presents an extensive survey of camera-only, LiDAR-only, and multi-modal 3D object detection algorithms, thoroughly evaluating their trade-off between accuracy, latency, and robustness.
Among these, multi-modal 3D detection approaches exhibit superior robustness, and a novel taxonomy is introduced to reorganize the literature for enhanced clarity.
arXiv Detail & Related papers (2024-01-12T12:35:45Z) - Student Activity Recognition in Classroom Environments using Transfer
Learning [0.0]
This paper proposes a system for detecting and recognizing the activities of students in a classroom environment.
Xception achieved an accuracy of 93%, on the novel classroom dataset.
arXiv Detail & Related papers (2023-12-01T04:51:57Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles.
An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach.
We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z) - SEAL: Self-supervised Embodied Active Learning using Exploration and 3D
Consistency [122.18108118190334]
We present a framework called Self- Embodied Embodied Active Learning (SEAL)
It utilizes perception models trained on internet images to learn an active exploration policy.
We and build utilize 3D semantic maps to learn both action and perception in a completely self-supervised manner.
arXiv Detail & Related papers (2021-12-02T06:26:38Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.