SMART: Scene-motion-aware human action recognition framework for mental disorder group
- URL: http://arxiv.org/abs/2406.04649v1
- Date: Fri, 7 Jun 2024 05:29:42 GMT
- Title: SMART: Scene-motion-aware human action recognition framework for mental disorder group
- Authors: Zengyuan Lai, Jiarui Yang, Songpengcheng Xia, Qi Wu, Zhen Sun, Wenxian Yu, Ling Pei,
- Abstract summary: We propose to build a vision-based Human Action Recognition dataset including abnormal actions often occurring in the mental disorder group.
We then introduce a novel Scene-Motion-aware Action Recognition framework, named SMART, consisting of two technical modules.
The effectiveness of our proposed method has been validated on our self-collected HAR dataset (HAD), achieving 94.9% and 93.1% accuracy in un-seen subjects and scenes, and outperforming state-of-the-art approaches by 6.5% and 13.2%, respectively.
- Score: 16.60713558596286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Patients with mental disorders often exhibit risky abnormal actions, such as climbing walls or hitting windows, necessitating intelligent video behavior monitoring for smart healthcare with the rising Internet of Things (IoT) technology. However, the development of vision-based Human Action Recognition (HAR) for these actions is hindered by the lack of specialized algorithms and datasets. In this paper, we innovatively propose to build a vision-based HAR dataset including abnormal actions often occurring in the mental disorder group and then introduce a novel Scene-Motion-aware Action Recognition Technology framework, named SMART, consisting of two technical modules. First, we propose a scene perception module to extract human motion trajectory and human-scene interaction features, which introduces additional scene information for a supplementary semantic representation of the above actions. Second, the multi-stage fusion module fuses the skeleton motion, motion trajectory, and human-scene interaction features, enhancing the semantic association between the skeleton motion and the above supplementary representation, thus generating a comprehensive representation with both human motion and scene information. The effectiveness of our proposed method has been validated on our self-collected HAR dataset (MentalHAD), achieving 94.9% and 93.1% accuracy in un-seen subjects and scenes and outperforming state-of-the-art approaches by 6.5% and 13.2%, respectively. The demonstrated subject- and scene- generalizability makes it possible for SMART's migration to practical deployment in smart healthcare systems for mental disorder patients in medical settings. The code and dataset will be released publicly for further research: https://github.com/Inowlzy/SMART.git.
Related papers
- Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Position and Orientation-Aware One-Shot Learning for Medical Action
Recognition from Signal Data [9.757753196253532]
We propose a position and orientation-aware one-shot learning framework for medical action recognition from signal data.
The proposed framework comprises two stages and each stage includes signal-level image generation ( SIG), cross-attention (CsA), dynamic time warping (DTW) modules.
arXiv Detail & Related papers (2023-09-27T13:08:15Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal
Reasoning in Dynamic Operating Rooms [39.11134330259464]
holistic modeling of the operating room (OR) is a challenging but essential task.
We introduce memory scene graphs, where the scene graphs of previous time steps act as the temporal representation guiding the current prediction.
We design an end-to-end architecture that intelligently fuses the temporal information of our lightweight memory scene graphs with the visual information from point clouds and images.
arXiv Detail & Related papers (2023-03-23T14:26:16Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - 4D-OR: Semantic Scene Graphs for OR Domain Modeling [72.1320671045942]
We propose using semantic scene graphs (SSG) to describe and summarize the surgical scene.
The nodes of the scene graphs represent different actors and objects in the room, such as medical staff, patients, and medical equipment.
We create the first publicly available 4D surgical SSG dataset, 4D-OR, containing ten simulated total knee replacement surgeries.
arXiv Detail & Related papers (2022-03-22T17:59:45Z) - Semantic Labeling of Human Action For Visually Impaired And Blind People
Scene Interaction [1.52292571922932]
The aim of this work is to contribute to the development of a tactile device for visually impaired and blind persons.
We use the skeleton information provided by Kinect, with the disentangled and unified multi-scale Graph Convolutional (MS-G3D) model to recognize the performed actions.
The recognized actions are labeled semantically and will be mapped into an output device perceivable by the touch sense.
arXiv Detail & Related papers (2022-01-12T21:21:05Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.