Masked Autoencoder with Swin Transformer Network for Mitigating Electrode Shift in HD-EMG-based Gesture Recognition
- URL: http://arxiv.org/abs/2410.17261v1
- Date: Mon, 07 Oct 2024 02:55:36 GMT
- Title: Masked Autoencoder with Swin Transformer Network for Mitigating Electrode Shift in HD-EMG-based Gesture Recognition
- Authors: Kasra Laamerad, Mehran Shabanpour, Md. Rabiul Islam, Arash Mohammadi,
- Abstract summary: Pattern recognition models based on HD-sEMG are vulnerable to changing recording conditions.
The paper proposes the Masked Autoencoder with Swin Transformer (MAST) framework, where training is performed on a masked subset of HDsEMG channels.
- Score: 6.19619911492252
- License:
- Abstract: Multi-channel surface Electromyography (sEMG), also referred to as high-density sEMG (HD-sEMG), plays a crucial role in improving gesture recognition performance for myoelectric control. Pattern recognition models developed based on HD-sEMG, however, are vulnerable to changing recording conditions (e.g., signal variability due to electrode shift). This has resulted in significant degradation in performance across subjects, and sessions. In this context, the paper proposes the Masked Autoencoder with Swin Transformer (MAST) framework, where training is performed on a masked subset of HDsEMG channels. A combination of four masking strategies, i.e., random block masking; temporal masking; sensor-wise random masking, and; multi-scale masking, is used to learn latent representations and increase robustness against electrode shift. The masked data is then passed through MAST's three-path encoder-decoder structure, leveraging a multi-path Swin-Unet architecture that simultaneously captures time-domain, frequency-domain, and magnitude-based features of the underlying HD-sEMG signal. These augmented inputs are then used in a self-supervised pre-training fashion to improve the model's generalization capabilities. Experimental results demonstrate the superior performance of the proposed MAST framework in comparison to its counterparts.
Related papers
- Spatial Adaptation Layer: Interpretable Domain Adaptation For Biosignal Sensor Array Applications [0.7499722271664147]
Biosignal acquisition is key for healthcare applications and wearable devices.
Existing solutions often require large and expensive datasets and/or lack robustness and interpretability.
We propose the Spatial Adaptation Layer (SAL), which can be prepended to any biosignal array model.
We also introduce learnable baseline normalization (LBN) to reduce baseline fluctuations.
arXiv Detail & Related papers (2024-09-12T14:06:12Z) - Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI [58.809276442508256]
We propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers.
The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network superior performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-11T15:46:00Z) - MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation [44.74056930805525]
We introduce a novel Masked Diffusion Transformer for co-speech gesture generation, referred to as MDT-A2G.
This model employs a mask modeling scheme specifically designed to strengthen temporal relation learning among sequence gestures.
Experimental results demonstrate that MDT-A2G excels in gesture generation, boasting a learning speed that is over 6$times$ faster than traditional diffusion transformers.
arXiv Detail & Related papers (2024-08-06T17:29:01Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Adversarial Masking Contrastive Learning for vein recognition [10.886119051977785]
Vein recognition has received increasing attention due to its high security and privacy.
Deep neural networks such as Convolutional neural networks (CNN) and Transformers have been introduced for vein recognition.
Despite the recent advances, existing solutions for finger-vein feature extraction are still not optimal due to scarce training image samples.
arXiv Detail & Related papers (2024-01-16T03:09:45Z) - Tackling Electrode Shift In Gesture Recognition with HD-EMG Electrode
Subsets [0.8192907805418583]
We propose training on a collection of input channel subsets and augmenting our training distribution with data from different electrode locations.
Our method increases robustness against electrode shift and results in significantly higher intersession performance across subjects and classification algorithms.
arXiv Detail & Related papers (2024-01-05T12:13:00Z) - Masked Motion Predictors are Strong 3D Action Representation Learners [143.9677635274393]
In 3D human action recognition, limited supervised data makes it challenging to fully tap into the modeling potential of powerful networks such as transformers.
We show that instead of following the prevalent pretext to perform masked self-component reconstruction in human joints, explicit contextual motion modeling is key to the success of learning effective feature representation for 3D action recognition.
arXiv Detail & Related papers (2023-08-14T11:56:39Z) - Calibrated Hyperspectral Image Reconstruction via Graph-based
Self-Tuning Network [40.71031760929464]
Hyperspectral imaging (HSI) has attracted increasing research attention, especially for the ones based on a coded snapshot spectral imaging (CASSI) system.
Existing deep HSI reconstruction models are generally trained on paired data to retrieve original signals upon 2D compressed measurements given by a particular optical hardware mask in CASSI.
This mask-specific training style will lead to a hardware miscalibration issue, which sets up barriers to deploying deep HSI models among different hardware and noisy environments.
We propose a novel Graph-based Self-Tuning ( GST) network to reason uncertainties adapting to varying spatial structures of masks among
arXiv Detail & Related papers (2021-12-31T09:39:13Z) - Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image
Reconstruction [127.20208645280438]
Hyperspectral image (HSI) reconstruction aims to recover the 3D spatial-spectral signal from a 2D measurement.
Modeling the inter-spectra interactions is beneficial for HSI reconstruction.
Mask-guided Spectral-wise Transformer (MST) proposes a novel framework for HSI reconstruction.
arXiv Detail & Related papers (2021-11-15T16:59:48Z) - TransRPPG: Remote Photoplethysmography Transformer for 3D Mask Face
Presentation Attack Detection [53.98866801690342]
3D mask face presentation attack detection (PAD) plays a vital role in securing face recognition systems from 3D mask attacks.
We propose a pure r transformer (TransR) framework for learning live intrinsicness representation efficiently.
Our TransR is lightweight and efficient (with only 547K parameters and 763MOPs) which is promising for mobile-level applications.
arXiv Detail & Related papers (2021-04-15T12:33:13Z) - Mask Attention Networks: Rethinking and Strengthen Transformer [70.95528238937861]
Transformer is an attention-based neural network, which consists of two sublayers, Self-Attention Network (SAN) and Feed-Forward Network (FFN)
arXiv Detail & Related papers (2021-03-25T04:07:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.