ViT-HGR: Vision Transformer-based Hand Gesture Recognition from High
Density Surface EMG Signals
- URL: http://arxiv.org/abs/2201.10060v1
- Date: Tue, 25 Jan 2022 02:42:50 GMT
- Title: ViT-HGR: Vision Transformer-based Hand Gesture Recognition from High
Density Surface EMG Signals
- Authors: Mansooreh Montazerin, Soheil Zabihi, Elahe Rahimian, Arash Mohammadi,
Farnoosh Naderkhani
- Abstract summary: We investigate and design a Vision Transformer (ViT) based architecture to perform hand gesture recognition from High Density (HD-sEMG) signals.
The proposed ViT-HGR framework can overcome the training time problems and can accurately classify a large number of hand gestures from scratch.
Our experiments with 64-sample (31.25 ms) window size yield average test accuracy of 84.62 +/- 3.07%, where only 78, 210 number of parameters is utilized.
- Score: 14.419091034872682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, there has been a surge of significant interest on application of
Deep Learning (DL) models to autonomously perform hand gesture recognition
using surface Electromyogram (sEMG) signals. DL models are, however, mainly
designed to be applied on sparse sEMG signals. Furthermore, due to their
complex structure, typically, we are faced with memory constraints; require
large training times and a large number of training samples, and; there is the
need to resort to data augmentation and/or transfer learning. In this paper,
for the first time (to the best of our knowledge), we investigate and design a
Vision Transformer (ViT) based architecture to perform hand gesture recognition
from High Density (HD-sEMG) signals. Intuitively speaking, we capitalize on the
recent breakthrough role of the transformer architecture in tackling different
complex problems together with its potential for employing more input
parallelization via its attention mechanism. The proposed Vision
Transformer-based Hand Gesture Recognition (ViT-HGR) framework can overcome the
aforementioned training time problems and can accurately classify a large
number of hand gestures from scratch without any need for data augmentation
and/or transfer learning. The efficiency of the proposed ViT-HGR framework is
evaluated using a recently-released HD-sEMG dataset consisting of 65 isometric
hand gestures. Our experiments with 64-sample (31.25 ms) window size yield
average test accuracy of 84.62 +/- 3.07%, where only 78, 210 number of
parameters is utilized. The compact structure of the proposed ViT-based ViT-HGR
framework (i.e., having significantly reduced number of trainable parameters)
shows great potentials for its practical application for prosthetic control.
Related papers
- An LSTM Feature Imitation Network for Hand Movement Recognition from sEMG Signals [2.632402517354116]
We propose utilizing a feature-imitating network (FIN) for closed-form temporal feature learning over a 300ms signal window on Ninapro DB2.
We then explore transfer learning capabilities by applying the pre-trained LSTM-FIN for tuning to a downstream hand movement recognition task.
arXiv Detail & Related papers (2024-05-23T21:45:15Z) - EMGTFNet: Fuzzy Vision Transformer to decode Upperlimb sEMG signals for
Hand Gestures Recognition [0.1611401281366893]
We propose a Vision Transformer (ViT) based architecture with a Fuzzy Neural Block (FNB) called EMGTFNet to perform Hand Gesture Recognition.
The accuracy of the proposed model is tested using the publicly available NinaPro database consisting of 49 different hand gestures.
arXiv Detail & Related papers (2023-09-23T18:55:26Z) - A Deep Learning Sequential Decoder for Transient High-Density
Electromyography in Hand Gesture Recognition Using Subject-Embedded Transfer
Learning [11.170031300110315]
Hand gesture recognition (HGR) has gained significant attention due to the increasing use of AI-powered human-computers.
These interfaces have a range of applications, including the control of extended reality, agile prosthetics, and exoskeletons.
These interfaces have a range of applications, including the control of extended reality, agile prosthetics, and exoskeletons.
arXiv Detail & Related papers (2023-09-23T05:32:33Z) - DAT++: Spatially Dynamic Vision Transformer with Deformable Attention [87.41016963608067]
We present Deformable Attention Transformer ( DAT++), a vision backbone efficient and effective for visual recognition.
DAT++ achieves state-of-the-art results on various visual recognition benchmarks, with 85.9% ImageNet accuracy, 54.5 and 47.0 MS-COCO instance segmentation mAP, and 51.5 ADE20K semantic segmentation mIoU.
arXiv Detail & Related papers (2023-09-04T08:26:47Z) - ViTPose++: Vision Transformer for Generic Body Pose Estimation [70.86760562151163]
We show the surprisingly good properties of plain vision transformers for body pose estimation from various aspects.
ViTPose employs the plain and non-hierarchical vision transformer as an encoder to encode features and a lightweight decoder to decode body keypoints.
We empirically demonstrate that the knowledge of large ViTPose models can be easily transferred to small ones via a simple knowledge token.
arXiv Detail & Related papers (2022-12-07T12:33:28Z) - Hand Gesture Recognition Using Temporal Convolutions and Attention
Mechanism [16.399230849853915]
We propose the novel Temporal Convolutions-based Hand Gesture Recognition architecture (TC-HGR) to reduce this computational burden.
We classified 17 hand gestures via surface Electromyogram (sEMG) signals by the adoption of attention mechanisms and temporal convolutions.
The proposed method led to 81.65% and 80.72% classification accuracy for window sizes of 300ms and 200ms, respectively.
arXiv Detail & Related papers (2021-10-17T04:23:59Z) - Vector-quantized Image Modeling with Improved VQGAN [93.8443646643864]
We propose a Vector-quantized Image Modeling approach that involves pretraining a Transformer to predict image tokens autoregressively.
We first propose multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity.
When trained on ImageNet at 256x256 resolution, we achieve Inception Score (IS) of 175.1 and Frechet Inception Distance (FID) of 4.17, a dramatic improvement over the vanilla VQGAN.
arXiv Detail & Related papers (2021-10-09T18:36:00Z) - TEMGNet: Deep Transformer-based Decoding of Upperlimb sEMG for Hand
Gestures Recognition [16.399230849853915]
We develop a framework based on the Transformer architecture for processing sEMG signals.
We propose a novel Vision Transformer (ViT)-based neural network architecture (referred to as the TEMGNet) to classify and recognize upperlimb hand gestures.
arXiv Detail & Related papers (2021-09-25T15:03:22Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.