USAD: End-to-End Human Activity Recognition via Diffusion Model with Spatiotemporal Attention
- URL: http://arxiv.org/abs/2507.02827v2
- Date: Fri, 11 Jul 2025 15:13:39 GMT
- Title: USAD: End-to-End Human Activity Recognition via Diffusion Model with Spatiotemporal Attention
- Authors: Hang Xiao, Ying Yu, Jiarui Li, Zhifan Yang, Haotian Tang, Hanyu Liu, Chao Li,
- Abstract summary: Human activity recognition is a task that finds broad applications in health monitoring, safety protection, and sports analysis.<n>Despite proliferating research, human activity recognition still faces key challenges, including the scarcity of labeled samples for rare activities.<n>This paper proposes a comprehensive optimization approach centered on multi-attention interaction mechanisms.
- Score: 8.061018589897277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The primary objective of human activity recognition (HAR) is to infer ongoing human actions from sensor data, a task that finds broad applications in health monitoring, safety protection, and sports analysis. Despite proliferating research, HAR still faces key challenges, including the scarcity of labeled samples for rare activities, insufficient extraction of high-level features, and suboptimal model performance on lightweight devices. To address these issues, this paper proposes a comprehensive optimization approach centered on multi-attention interaction mechanisms. First, an unsupervised, statistics-guided diffusion model is employed to perform data augmentation, thereby alleviating the problems of labeled data scarcity and severe class imbalance. Second, a multi-branch spatio-temporal interaction network is designed, which captures multi-scale features of sequential data through parallel residual branches with 3*3, 5*5, and 7*7 convolutional kernels. Simultaneously, temporal attention mechanisms are incorporated to identify critical time points, while spatial attention enhances inter-sensor interactions. A cross-branch feature fusion unit is further introduced to improve the overall feature representation capability. Finally, an adaptive multi-loss function fusion strategy is integrated, allowing for dynamic adjustment of loss weights and overall model optimization. Experimental results on three public datasets, WISDM, PAMAP2, and OPPORTUNITY, demonstrate that the proposed unsupervised data augmentation spatio-temporal attention diffusion network (USAD) achieves accuracies of 98.84%, 93.81%, and 80.92% respectively, significantly outperforming existing approaches. Furthermore, practical deployment on embedded devices verifies the efficiency and feasibility of the proposed method.
Related papers
- Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization [66.10528870853324]
Fusing and balancing multi-modal inputs from novel sensors for dense prediction tasks is critically important.<n>One major limitation is the tendency of multi-modal frameworks to over-rely on easily learnable modalities.<n>We propose a plug-and-play regularization term based on functional entropy, which introduces no additional parameters.
arXiv Detail & Related papers (2025-05-10T12:58:15Z) - VAE-based Feature Disentanglement for Data Augmentation and Compression in Generalized GNSS Interference Classification [42.14439854721613]
We propose variational autoencoders (VAEs) for disentanglement to extract essential latent features that enable accurate classification of interferences.<n>Our proposed VAE achieves a data compression rate ranging from 512 to 8,192 and achieves an accuracy up to 99.92%.
arXiv Detail & Related papers (2025-04-14T13:38:00Z) - CMD-HAR: Cross-Modal Disentanglement for Wearable Human Activity Recognition [9.891343123345829]
Human Activity Recognition (HAR) is a fundamental technology for numerous human centered - intelligent applications.<n>The aim of this paper is to address issues such as multimodal data mixing, activity disc and complex model deployment in sensor-based human activity.
arXiv Detail & Related papers (2025-03-27T15:21:49Z) - Process Optimization and Deployment for Sensor-Based Human Activity Recognition Based on Deep Learning [9.445469731895505]
We propose a comprehensive optimization process approach centered on multi-attention interaction.<n>We conduct extensive testing on three public datasets, including ablation studies, comparisons of related work and embedded deployments.
arXiv Detail & Related papers (2025-03-22T16:48:16Z) - MSCA-Net:Multi-Scale Context Aggregation Network for Infrared Small Target Detection [0.1759252234439348]
This paper proposes a network architecture named MSCA-Net, which integrates three key components.<n>MSEDA employs a multi-scale feature fusion attention mechanism to adaptively aggregate information across different scales.<n>PCBAM captures the correlation between global and local features through a correlation matrix-based strategy.<n> CAB enhances the representation of critical features by assigning greater weights to them, integrating both low-level and high-level information.
arXiv Detail & Related papers (2025-03-21T14:42:31Z) - Multimodal Attention-Enhanced Feature Fusion-based Weekly Supervised Anomaly Violence Detection [1.9223495770071632]
This system uses three feature streams: RGB video, optical flow, and audio signals, where each stream extracts complementary spatial and temporal features.
The system significantly improves anomaly detection accuracy and robustness across three datasets.
arXiv Detail & Related papers (2024-09-17T14:17:52Z) - Distribution Discrepancy and Feature Heterogeneity for Active 3D Object Detection [18.285299184361598]
LiDAR-based 3D object detection is a critical technology for the development of autonomous driving and robotics.
We propose a novel and effective active learning (AL) method called Distribution Discrepancy and Feature Heterogeneity (DDFH)
It simultaneously considers geometric features and model embeddings, assessing information from both the instance-level and frame-level perspectives.
arXiv Detail & Related papers (2024-09-09T08:26:11Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Cross-Cluster Shifting for Efficient and Effective 3D Object Detection
in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving.
We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector.
We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Learning Self-Modulating Attention in Continuous Time Space with
Applications to Sequential Recommendation [102.24108167002252]
We propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences.
We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T03:54:11Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.