Related papers: WiFi based Human Fall and Activity Recognition using Transformer based Encoder Decoder and Graph Neural Networks

WiFi based Human Fall and Activity Recognition using Transformer based Encoder Decoder and Graph Neural Networks

URL: http://arxiv.org/abs/2504.16655v1
Date: Wed, 23 Apr 2025 12:22:24 GMT
Title: WiFi based Human Fall and Activity Recognition using Transformer based Encoder Decoder and Graph Neural Networks
Authors: Younggeol Cho, Elisa Motta, Olivia Nocentini, Marta Lagomarsino, Andrea Merello, Marco Crepaldi, Arash Ajoudani,
Abstract summary: Transformer based Decoder Network (TED Net) designed for estimating human skeleton poses from WiFi signals.<n>TED Net integrates convolutional encoders with transformer based attention mechanisms to capture skeleton features from CSI signals.
Score: 8.427757893042374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human pose estimation and action recognition have received attention due to their critical roles in healthcare monitoring, rehabilitation, and assistive technologies. In this study, we proposed a novel architecture named Transformer based Encoder Decoder Network (TED Net) designed for estimating human skeleton poses from WiFi Channel State Information (CSI). TED Net integrates convolutional encoders with transformer based attention mechanisms to capture spatiotemporal features from CSI signals. The estimated skeleton poses were used as input to a customized Directed Graph Neural Network (DGNN) for action recognition. We validated our model on two datasets: a publicly available multi modal dataset for assessing general pose estimation, and a newly collected dataset focused on fall related scenarios involving 20 participants. Experimental results demonstrated that TED Net outperformed existing approaches in pose estimation, and that the DGNN achieves reliable action classification using CSI based skeletons, with performance comparable to RGB based systems. Notably, TED Net maintains robust performance across both fall and non fall cases. These findings highlight the potential of CSI driven human skeleton estimation for effective action recognition, particularly in home environments such as elderly fall detection. In such settings, WiFi signals are often readily available, offering a privacy preserving alternative to vision based methods, which may raise concerns about continuous camera monitoring.

Related papers

Generative MIMO Beam Map Construction for Location Recovery and Beam Tracking [67.65578956523403]
This paper proposes a generative framework to recover location labels directly from sparse channel state information (CSI) measurements.<n>Instead of directly storing raw CSI, we learn a compact low-dimensional radio map embedding and leverage a generative model to reconstruct the high-dimensional CSI.<n> Numerical experiments demonstrate that the proposed model can improve localization accuracy by over 30% and achieve a 20% capacity gain in non-line-of-sight (NLOS) scenarios.
arXiv Detail & Related papers (2025-11-21T07:25:49Z)
maxVSTAR: Maximally Adaptive Vision-Guided CSI Sensing with Closed-Loop Edge Model Adaptation for Robust Human Activity Recognition [0.0]
maxVSTAR is a vision-guided model adaptation framework that mitigates domain shift for edge-deployed CSI sensing systems.<n>The proposed system integrates a cross-modal teacher-student architecture, where a high-accuracy YOLO-based vision model serves as a dynamic supervisory signal.<n> Extensive experiments demonstrate the effectiveness of maxVSTAR.
arXiv Detail & Related papers (2025-10-30T04:59:28Z)
Anticipatory Fall Detection in Humans with Hybrid Directed Graph Neural Networks and Long Short-Term Memory [12.677218248209494]
We propose a hybrid model combining Dynamic Graph Neural Networks (DGNN) with Long Short-Term Memory (LSTM) networks to anticipate falls.<n>Our approach employs real-time skeletal features extracted from video sequences as input for the proposed model.<n>The LSTM-based network then predicts human movement in subsequent time steps, enabling early detection of falls.
arXiv Detail & Related papers (2025-09-01T12:56:31Z)
Learning a Neural Association Network for Self-supervised Multi-Object Tracking [34.07776597698471]
This paper introduces a novel framework to learn data association for multi-object tracking in a self-supervised manner. Motivated by the fact that in real-world scenarios object motion can be usually represented by a Markov process, we present a novel expectation (EM) algorithm that trains a neural network to associate detections for tracking. We evaluate our approach on the challenging MOT17 and MOT20 datasets and achieve state-of-the-art results in comparison to self-supervised trackers.
arXiv Detail & Related papers (2024-11-18T12:22:29Z)
Data Augmentation Techniques for Cross-Domain WiFi CSI-based Human Activity Recognition [1.7404865362620803]
WiFi Channel State Information (CSI) enables contactless and visual privacy-preserving sensing in indoor environments. Poor model generalization, due to varying environmental conditions and sensing hardware, is a well-known problem in this space. Data augmentation techniques commonly used in image-based learning are applied to WiFi CSI to investigate their effects on model generalization performance.
arXiv Detail & Related papers (2024-01-01T22:27:59Z)
EventTransAct: A video transformer-based framework for Event-camera based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos. In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame. In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z)
Joint-bone Fusion Graph Convolutional Network for Semi-supervised Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder. Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream. The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z)
CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point Clouds [51.47100091540298]
We present Cascaded Primitive Fitting Networks (CPFN) that relies on an adaptive patch sampling network to assemble detection results of global and local primitive detection networks. CPFN improves the state-of-the-art SPFN performance by 13-14% on high-resolution point cloud datasets and specifically improves the detection of fine-scale primitives by 20-22%.
arXiv Detail & Related papers (2021-08-31T23:27:33Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain Gait Recognition [77.77786072373942]
This paper proposes a Transferable Neighborhood Discovery (TraND) framework to bridge the domain gap for unsupervised cross-domain gait recognition. We design an end-to-end trainable approach to automatically discover the confident neighborhoods of unlabeled samples in the latent space. Our method achieves state-of-the-art results on two public datasets, i.e., CASIA-B and OU-LP.
arXiv Detail & Related papers (2021-02-09T03:07:07Z)
Deep Learning based Covert Attack Identification for Industrial Control Systems [5.299113288020827]
We develop a data-driven framework that can be used to detect, diagnose, and localize a type of cyberattack called covert attacks on smart grids. The framework has a hybrid design that combines an autoencoder, a recurrent neural network (RNN) with a Long-Short-Term-Memory layer, and a Deep Neural Network (DNN)
arXiv Detail & Related papers (2020-09-25T17:48:43Z)
Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties. Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates. The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)
Complex Human Action Recognition in Live Videos Using Hybrid FR-DL Method [1.027974860479791]
We address challenges of the preprocessing phase, by an automated selection of representative frames among the input sequences. We propose a hybrid technique using background subtraction and HOG, followed by application of a deep neural network and skeletal modelling method. We name our model as Feature Reduction & Deep Learning based action recognition method, or FR-DL in short.
arXiv Detail & Related papers (2020-07-06T15:12:50Z)
Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions. Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks. This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.