Time2Vec Transformer for Robust Gesture Recognition from Low-Density sEMG
- URL: http://arxiv.org/abs/2602.01855v2
- Date: Tue, 03 Feb 2026 13:09:27 GMT
- Title: Time2Vec Transformer for Robust Gesture Recognition from Low-Density sEMG
- Authors: Blagoj Hristov, Hristijan Gjoreski, Vesna Ojleska Latkoska, Gorjan Nadzinski,
- Abstract summary: This paper presents a novel, data-efficient deep learning framework for myoelectric prosthesis control.<n>Our approach implements a hybrid Transformer optimized for sparse, two-channel surface electromyography (sEMG)<n>The proposed framework offers a robust, cost-effective blueprint for next-generation prosthetic interfaces capable of rapid personalization.
- Score: 1.231764991565978
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Accurate and responsive myoelectric prosthesis control typically relies on complex, dense multi-sensor arrays, which limits consumer accessibility. This paper presents a novel, data-efficient deep learning framework designed to achieve precise and accurate control using minimal sensor hardware. Leveraging an external dataset of 8 subjects, our approach implements a hybrid Transformer optimized for sparse, two-channel surface electromyography (sEMG). Unlike standard architectures that use fixed positional encodings, we integrate Time2Vec learnable temporal embeddings to capture the stochastic temporal warping inherent in biological signals. Furthermore, we employ a normalized additive fusion strategy that aligns the latent distributions of spatial and temporal features, preventing the destructive interference common in standard implementations. A two-stage curriculum learning protocol is utilized to ensure robust feature extraction despite data scarcity. The proposed architecture achieves a state-of-the-art multi-subject F1-score of 95.7% $\pm$ 0.20% for a 10-class movement set, statistically outperforming both a standard Transformer with fixed encodings and a recurrent CNN-LSTM model. Architectural optimization reveals that a balanced allocation of model capacity between spatial and temporal dimensions yields the highest stability. Furthermore, while direct transfer to a new unseen subject led to poor accuracy due to domain shifts, a rapid calibration protocol utilizing only two trials per gesture recovered performance from 21.0% $\pm$ 2.98% to 96.9% $\pm$ 0.52%. By validating that high-fidelity temporal embeddings can compensate for low spatial resolution, this work challenges the necessity of high-density sensing. The proposed framework offers a robust, cost-effective blueprint for next-generation prosthetic interfaces capable of rapid personalization.
Related papers
- PRISM: Performer RS-IMLE for Single-pass Multisensory Imitation Learning [51.24484551729328]
We introduce PRISM, a single-pass policy based on a batch-global rejection-sampling variant of IMLE.<n> PRISM couples a temporal multisensory encoder with a linear-attention generator using a Performer architecture.<n>We demonstrate the efficacy of PRISM on a diverse real-world hardware suite, including loco-manipulation using a Unitree Go2 with a 7-DoF arm D1 and tabletop manipulation with a UR5 manipulator.
arXiv Detail & Related papers (2026-02-02T17:57:37Z) - Leveraging Convolutional Sparse Autoencoders for Robust Movement Classification from Low-Density sEMG [0.46976113832881716]
This study proposes a deep learning framework for accurate gesture recognition using only two surface electromyography (sEMG) channels.<n>We present a few-shot transfer learning protocol that improved performance on unseen subjects from a baseline of 35.1% $pm$ 3.1% to 92.3% $pm$ 0.9% with minimal calibration data.
arXiv Detail & Related papers (2026-01-30T14:21:46Z) - HCFT: Hierarchical Convolutional Fusion Transformer for EEG Decoding [9.572621097681646]
We propose a lightweight decoding framework named Hierarchical Conencephaloal Fusion Transformer (HCFT)<n>HCFT combines dual-branchal encoders and hierarchical Transformer blocks for multi-scale representation.<n>Results show that HCFT achieves 80.83% average accuracy and a Cohen's kappa of 0.6165 on BCI IV-2b, as well as 99.10% sensitivity, 0.0236 false positives per hour, and 98.82% specificity on CHB-MIT.
arXiv Detail & Related papers (2026-01-18T06:36:30Z) - Lightweight Test-Time Adaptation for EMG-Based Gesture Recognition [2.414036142474149]
We propose a lightweight framework for Test-Time Adaptation (TTA) using a Temporal Convolutional Network (TCN) backbone.<n>We introduce three deployment-ready strategies: causal adaptive batch normalization for real-time statistical alignment; (ii) a Gaussian Mixture Model (GMM) alignment with experience replay to prevent forgetting; and (iii) meta-learning for rapid, few-shot calibration.<n>Our results show that experience-replay updates yield superior stability under limited data, while meta-learning achieves competitive performance in one- and two-shot regimes.
arXiv Detail & Related papers (2026-01-07T18:48:31Z) - Hierarchical Spatio-Temporal Attention Network with Adaptive Risk-Aware Decision for Forward Collision Warning in Complex Scenarios [7.238050152381639]
This paper introduces an integrated Forward Collision Warning framework that pairs a Hierarchical Spatio-Temporal Attention Network with a Dynamic Risk Threshold Adjustment algorithm.<n>Tested across multi-scenario datasets, the complete system demonstrates high efficacy, achieving an F1 score of 0.912, a low false alarm rate of 8.2%, and an ample warning lead time of 2.8 seconds.
arXiv Detail & Related papers (2025-11-25T05:57:29Z) - Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models [97.55009021098554]
This work aims to identify the key determinants of SLMs' real-device latency and offer generalizable principles and methodologies for SLM design and training.<n>We introduce a new family of hybrid SLMs, called Nemotron-Flash, which significantly advances the accuracy-efficiency frontier of state-of-the-art SLMs.
arXiv Detail & Related papers (2025-11-24T08:46:36Z) - Explainable AI for Accelerated Microstructure Imaging: A SHAP-Guided Protocol on the Connectome 2.0 scanner [0.8306551626831736]
This study proposes a reduced acquisition scheme for the Connectome 2.0 scanner that preserves model accuracy while substantially shortening scan duration.<n>We developed a data-driven framework using explainable artificial intelligence with a guided feature elimination strategy to identify an optimal 8-feature subset from a 15-feature protocol.
arXiv Detail & Related papers (2025-09-11T14:53:26Z) - Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints [51.83081671798784]
Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability.<n>DiT's practical application suffers from inherent dynamic feature instability, leading to error amplification during cached inference.<n>We propose Skip-DiT, an image and video generative DiT variant enhanced with Long-Skip-Connections (LSCs) - the key efficiency component in U-Nets.
arXiv Detail & Related papers (2024-11-26T17:28:10Z) - Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM)
CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance.
Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - Federated Learning for Energy-limited Wireless Networks: A Partial Model
Aggregation Approach [79.59560136273917]
limited communication resources, bandwidth and energy, and data heterogeneity across devices are main bottlenecks for federated learning (FL)
We first devise a novel FL framework with partial model aggregation (PMA)
The proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets.
arXiv Detail & Related papers (2022-04-20T19:09:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.