MMTSA: Multimodal Temporal Segment Attention Network for Efficient Human
Activity Recognition
- URL: http://arxiv.org/abs/2210.09222v2
- Date: Wed, 11 Oct 2023 19:59:02 GMT
- Title: MMTSA: Multimodal Temporal Segment Attention Network for Efficient Human
Activity Recognition
- Authors: Ziqi Gao, Yuntao Wang, Jianguo Chen, Junliang Xing, Shwetak Patel, Xin
Liu, Yuanchun Shi
- Abstract summary: Multimodal sensors provide complementary information to develop accurate machine-learning methods for human activity recognition.
This paper proposes an efficient multimodal neural architecture for HAR using an RGB camera and inertial measurement units (IMUs)
Using three well-established public datasets, we evaluated MMTSA's effectiveness and efficiency in HAR.
- Score: 33.94582546667864
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multimodal sensors provide complementary information to develop accurate
machine-learning methods for human activity recognition (HAR), but introduce
significantly higher computational load, which reduces efficiency. This paper
proposes an efficient multimodal neural architecture for HAR using an RGB
camera and inertial measurement units (IMUs) called Multimodal Temporal Segment
Attention Network (MMTSA). MMTSA first transforms IMU sensor data into a
temporal and structure-preserving gray-scale image using the Gramian Angular
Field (GAF), representing the inherent properties of human activities. MMTSA
then applies a multimodal sparse sampling method to reduce data redundancy.
Lastly, MMTSA adopts an inter-segment attention module for efficient multimodal
fusion. Using three well-established public datasets, we evaluated MMTSA's
effectiveness and efficiency in HAR. Results show that our method achieves
superior performance improvements 11.13% of cross-subject F1-score on the MMAct
dataset than the previous state-of-the-art (SOTA) methods. The ablation study
and analysis suggest that MMTSA's effectiveness in fusing multimodal data for
accurate HAR. The efficiency evaluation on an edge device showed that MMTSA
achieved significantly better accuracy, lower computational load, and lower
inference latency than SOTA methods.
Related papers
- AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection [23.91870504363899]
Double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data.
This has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems.
We introduce the Adaptive Modal Fusion Distillation (AMFD) framework, which can fully utilize the original modal features of the teacher network.
arXiv Detail & Related papers (2024-05-21T17:17:17Z) - AMMUNet: Multi-Scale Attention Map Merging for Remote Sensing Image Segmentation [4.618389486337933]
We propose AMMUNet, a UNet-based framework that employs multi-scale attention map merging.
The proposed AMMM effectively combines multi-scale attention maps into a unified representation using a fixed mask template.
We show that our approach achieves remarkable mean intersection over union (mIoU) scores of 75.48% on the Vaihingen dataset and an exceptional 77.90% on the Potsdam dataset.
arXiv Detail & Related papers (2024-04-20T15:23:15Z) - HARMamba: Efficient and Lightweight Wearable Sensor Human Activity Recognition Based on Bidirectional Mamba [7.412537185607976]
Wearable sensor-based human activity recognition (HAR) is a critical research domain in activity perception.
This study introduces HARMamba, an innovative light-weight and versatile HAR architecture that combines selective bidirectional State Spaces Model and hardware-aware design.
HarMamba outperforms contemporary state-of-the-art frameworks, delivering comparable or better accuracy with significantly reducing computational and memory demands.
arXiv Detail & Related papers (2024-03-29T13:57:46Z) - PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly
Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce.
We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD.
Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - Dimensionality Expansion of Load Monitoring Time Series and Transfer
Learning for EMS [0.7133136338850781]
Energy management systems rely on (non)-intrusive load monitoring (N)ILM to monitor and manage appliances.
We propose a new approach for load monitoring in building EMS based on dimensionality expansion of time series and transfer learning.
arXiv Detail & Related papers (2022-04-06T13:13:24Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z) - A Deep Learning Method for Complex Human Activity Recognition Using
Virtual Wearable Sensors [22.923108537119685]
Sensor-based human activity recognition (HAR) is now a research hotspot in multiple application areas.
We propose a novel method based on deep learning for complex HAR in the real-scene.
The proposed method can surprisingly converge in a few iterations and achieve an accuracy of 91.15% on a real IMU dataset.
arXiv Detail & Related papers (2020-03-04T03:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.