Related papers: WaveFormer: A Lightweight Transformer Model for sEMG-based Gesture Recognition

WaveFormer: A Lightweight Transformer Model for sEMG-based Gesture Recognition

URL: http://arxiv.org/abs/2506.11168v1
Date: Thu, 12 Jun 2025 04:07:11 GMT
Title: WaveFormer: A Lightweight Transformer Model for sEMG-based Gesture Recognition
Authors: Yanlong Chen, Mattia Orlandi, Pierangelo Maria Rapa, Simone Benatti, Luca Benini, Yawei Li,
Abstract summary: WaveFormer is a lightweight transformer-based architecture tailored for sEMG gesture recognition.<n>Our model integrates time-domain and frequency-domain features through a novel learnable wavelet transform, enhancing feature extraction.<n>With just 3.1 million parameters, WaveFormer achieves 95% classification accuracy on the EPN612 dataset, outperforming larger models.
Score: 18.978031999678507
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Human-machine interaction, particularly in prosthetic and robotic control, has seen progress with gesture recognition via surface electromyographic (sEMG) signals.However, classifying similar gestures that produce nearly identical muscle signals remains a challenge, often reducing classification accuracy. Traditional deep learning models for sEMG gesture recognition are large and computationally expensive, limiting their deployment on resource-constrained embedded systems. In this work, we propose WaveFormer, a lightweight transformer-based architecture tailored for sEMG gesture recognition. Our model integrates time-domain and frequency-domain features through a novel learnable wavelet transform, enhancing feature extraction. In particular, the WaveletConv module, a multi-level wavelet decomposition layer with depthwise separable convolution, ensures both efficiency and compactness. With just 3.1 million parameters, WaveFormer achieves 95% classification accuracy on the EPN612 dataset, outperforming larger models. Furthermore, when profiled on a laptop equipped with an Intel CPU, INT8 quantization achieves real-time deployment with a 6.75 ms inference latency.

Related papers

Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z)
Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z)
WaveFormer: A 3D Transformer with Wavelet-Driven Feature Representation for Efficient Medical Image Segmentation [0.5312470855079862]
We present WaveFormer, a novel 3D-transformer for medical images.<n>It is inspired by the top-down mechanism of the human visual recognition system.<n>It preserves both global context and high-frequency details while replacing heavy upsampling layers with efficient wavelet-based summarization and reconstruction.
arXiv Detail & Related papers (2025-03-31T06:28:41Z)
Wavelet-based Bi-dimensional Aggregation Network for SAR Image Change Detection [53.842568573251214]
Experimental results on three SAR datasets demonstrate that our WBANet significantly outperforms contemporary state-of-the-art methods. Our WBANet achieves 98.33%, 96.65%, and 96.62% of percentage of correct classification (PCC) on the respective datasets.
arXiv Detail & Related papers (2024-07-18T04:36:10Z)
EMGTFNet: Fuzzy Vision Transformer to decode Upperlimb sEMG signals for Hand Gestures Recognition [0.1611401281366893]
We propose a Vision Transformer (ViT) based architecture with a Fuzzy Neural Block (FNB) called EMGTFNet to perform Hand Gesture Recognition. The accuracy of the proposed model is tested using the publicly available NinaPro database consisting of 49 different hand gestures.
arXiv Detail & Related papers (2023-09-23T18:55:26Z)
One-Dimensional Deep Image Prior for Curve Fitting of S-Parameters from Electromagnetic Solvers [57.441926088870325]
Deep Image Prior (DIP) is a technique that optimized the weights of a randomly-d convolutional neural network to fit a signal from noisy or under-determined measurements. Relative to publicly available implementations of Vector Fitting (VF), our method shows superior performance on nearly all test examples.
arXiv Detail & Related papers (2023-06-06T20:28:37Z)
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning [138.29273453811945]
Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone for computer vision tasks. We propose a new Wavelet Vision Transformer (textbfWave-ViT) that formulates the invertible down-sampling with wavelet transforms and self-attention learning.
arXiv Detail & Related papers (2022-07-11T16:03:51Z)
Decision Forest Based EMG Signal Classification with Low Volume Dataset Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience. We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z)
Bioformers: Embedding Transformers for Ultra-Low Power sEMG-based Gesture Recognition [21.486555297061717]
Human-machine interaction is gaining traction in rehabilitation tasks, such as controlling prosthetic hands or robotic arms. Gesture recognition exploiting surface electromyographic (sEMG) signals is one of the most promising approaches. However, the analysis of these signals still presents many challenges since similar gestures result in similar muscle contractions.
arXiv Detail & Related papers (2022-03-24T08:37:26Z)
FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain. Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z)
ViT-HGR: Vision Transformer-based Hand Gesture Recognition from High Density Surface EMG Signals [14.419091034872682]
We investigate and design a Vision Transformer (ViT) based architecture to perform hand gesture recognition from High Density (HD-sEMG) signals. The proposed ViT-HGR framework can overcome the training time problems and can accurately classify a large number of hand gestures from scratch. Our experiments with 64-sample (31.25 ms) window size yield average test accuracy of 84.62 +/- 3.07%, where only 78, 210 number of parameters is utilized.
arXiv Detail & Related papers (2022-01-25T02:42:50Z)
Towards Domain-Independent and Real-Time Gesture Recognition Using mmWave Signal [11.76969975145963]
DI-Gesture is a domain-independent and real-time mmWave gesture recognition system. In real-time scenario, the accuracy of DI-Gesutre reaches over 97% with average inference time of 2.87ms.
arXiv Detail & Related papers (2021-11-11T13:28:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.