Electromyography-Based Gesture Recognition: Hierarchical Feature Extraction for Enhanced Spatial-Temporal Dynamics
- URL: http://arxiv.org/abs/2504.03221v1
- Date: Fri, 04 Apr 2025 07:11:12 GMT
- Title: Electromyography-Based Gesture Recognition: Hierarchical Feature Extraction for Enhanced Spatial-Temporal Dynamics
- Authors: Jungpil Shin, Abu Saleh Musa Miah, Sota Konnai, Shu Hoshitaka, Pankoo Kim,
- Abstract summary: We propose a lightweight squeeze-excitation deep learning-based multi stream spatial temporal dynamics time-varying feature extraction approach.<n>The proposed model was tested on the Ninapro DB2, DB4, and DB5 datasets, achieving accuracy rates of 96.41%, 92.40%, and 93.34%, respectively.
- Score: 0.7083699704958353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hand gesture recognition using multichannel surface electromyography (sEMG) is challenging due to unstable predictions and inefficient time-varying feature enhancement. To overcome the lack of signal based time-varying feature problems, we propose a lightweight squeeze-excitation deep learning-based multi stream spatial temporal dynamics time-varying feature extraction approach to build an effective sEMG-based hand gesture recognition system. Each branch of the proposed model was designed to extract hierarchical features, capturing both global and detailed spatial-temporal relationships to ensure feature effectiveness. The first branch, utilizing a Bidirectional-TCN (Bi-TCN), focuses on capturing long-term temporal dependencies by modelling past and future temporal contexts, providing a holistic view of gesture dynamics. The second branch, incorporating a 1D Convolutional layer, separable CNN, and Squeeze-and-Excitation (SE) block, efficiently extracts spatial-temporal features while emphasizing critical feature channels, enhancing feature relevance. The third branch, combining a Temporal Convolutional Network (TCN) and Bidirectional LSTM (BiLSTM), captures bidirectional temporal relationships and time-varying patterns. Outputs from all branches are fused using concatenation to capture subtle variations in the data and then refined with a channel attention module, selectively focusing on the most informative features while improving computational efficiency. The proposed model was tested on the Ninapro DB2, DB4, and DB5 datasets, achieving accuracy rates of 96.41%, 92.40%, and 93.34%, respectively. These results demonstrate the capability of the system to handle complex sEMG dynamics, offering advancements in prosthetic limb control and human-machine interface technologies with significant implications for assistive technologies.
Related papers
- SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.<n>We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.<n>With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z) - Multimodal Attention-Enhanced Feature Fusion-based Weekly Supervised Anomaly Violence Detection [1.9223495770071632]
This system uses three feature streams: RGB video, optical flow, and audio signals, where each stream extracts complementary spatial and temporal features.
The system significantly improves anomaly detection accuracy and robustness across three datasets.
arXiv Detail & Related papers (2024-09-17T14:17:52Z) - Attention-based Spatial-Temporal Graph Convolutional Recurrent Networks
for Traffic Forecasting [12.568905377581647]
Traffic forecasting is one of the most fundamental problems in transportation science and artificial intelligence.
Existing methods cannot accurately model both long-term and short-term temporal correlations simultaneously.
We propose a novel spatial-temporal neural network framework, which consists of a graph convolutional recurrent module (GCRN) and a global attention module.
arXiv Detail & Related papers (2023-02-25T03:37:00Z) - TCJA-SNN: Temporal-Channel Joint Attention for Spiking Neural Networks [22.965024490694525]
Spiking Neural Networks (SNNs) are attracting widespread interest due to their biological plausibility, energy efficiency and powerful-temporal information representation ability.
We present a Temporal-Channel Joint Attention mechanism for SNNs, referred to as TCJA-SNN.
The proposed TCJA-SNN framework can effectively assess the significance of spike sequence from both spatial and temporal dimensions.
arXiv Detail & Related papers (2022-06-21T08:16:08Z) - Skeleton-based Action Recognition via Temporal-Channel Aggregation [5.620303498964992]
We propose a Temporal-Channel Aggregation Graph Conal Networks (TCA-CN) to learn spatial and temporal topologies.
In addition, we extract multi-scale skeletal temporal modeling and fuse them with priori skeletal knowledge with an attention mechanism.
arXiv Detail & Related papers (2022-05-31T16:28:30Z) - Discrete-time Temporal Network Embedding via Implicit Hierarchical
Learning in Hyperbolic Space [43.280123606888395]
We propose a hyperbolic temporal graph network (HTGN) that takes advantage of the exponential capacity and hierarchical awareness of hyperbolic geometry.
HTGN maps the temporal graph into hyperbolic space, and incorporates hyperbolic graph neural network and hyperbolic gated recurrent neural network.
Experimental results on multiple real-world datasets demonstrate the superiority of HTGN for temporal graph embedding.
arXiv Detail & Related papers (2021-07-08T11:24:59Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - A Two-stream Neural Network for Pose-based Hand Gesture Recognition [23.50938160992517]
Pose based hand gesture recognition has been widely studied in the recent years.
This paper proposes a two-stream neural network with one stream being a self-attention based graph convolutional network (SAGCN)
The residual-connection enhanced Bi-IndRNN extends an IndRNN with the capability of bidirectional processing for temporal modelling.
arXiv Detail & Related papers (2021-01-22T03:22:26Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - On the spatial attention in Spatio-Temporal Graph Convolutional Networks
for skeleton-based human action recognition [97.14064057840089]
Graphal networks (GCNs) promising performance in skeleton-based human action recognition by modeling a sequence of skeletons as a graph.
Most of the recently proposed G-temporal-based methods improve the performance by learning the graph structure at each layer of the network.
arXiv Detail & Related papers (2020-11-07T19:03:04Z) - Learn to cycle: Time-consistent feature discovery for action recognition [83.43682368129072]
Generalizing over temporal variations is a prerequisite for effective action recognition in videos.
We introduce Squeeze Re Temporal Gates (SRTG), an approach that favors temporal activations with potential variations.
We show consistent improvement when using SRTPG blocks, with only a minimal increase in the number of GFLOs.
arXiv Detail & Related papers (2020-06-15T09:36:28Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.