Related papers: Design of Human Machine Interface through vision-based low-cost Hand Gesture Recognition system based on deep CNN

Design of Human Machine Interface through vision-based low-cost Hand Gesture Recognition system based on deep CNN

URL: http://arxiv.org/abs/2207.03112v2
Date: Mon, 11 Jul 2022 13:55:30 GMT
Title: Design of Human Machine Interface through vision-based low-cost Hand Gesture Recognition system based on deep CNN
Authors: Abir Sen, Tapas Kumar Mishra and Ratnakar Dash
Abstract summary: A real-time hand gesture recognition system-based human-computer interface (HCI) is presented. The system consists of six stages: hand detection, (2) gesture segmentation, (3) use of six pre-trained CNN models by using the transfer-learning method, (4) building an interactive human-machine interface, (5) development of a gesture-controlled virtual mouse.
Score: 3.5665681694253903
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, a real-time hand gesture recognition system-based human-computer interface (HCI) is presented. The system consists of six stages: (1) hand detection, (2) gesture segmentation, (3) use of six pre-trained CNN models by using the transfer-learning method, (4) building an interactive human-machine interface, (5) development of a gesture-controlled virtual mouse, (6) use of Kalman filter to estimate the hand position, based on that the smoothness of the motion of pointer is improved. Six pre-trained convolutional neural network (CNN) models (VGG16, VGG19, ResNet50, ResNet101, Inception-V1, and MobileNet-V1) have been used to classify hand gesture images. Three multi-class datasets (two publicly and one custom) have been used to evaluate the model performances. Considering the models' performances, it has been observed that Inception-V1 has significantly shown a better classification performance compared to the other five pre-trained models in terms of accuracy, precision, recall, and F-score values. The gesture recognition system is expanded and used to control multimedia applications (like VLC player, audio player, file management, playing 2D Super-Mario-Bros game, etc.) with different customized gesture commands in real-time scenarios. The average speed of this system has reached 35 fps (frame per seconds), which meets the requirements for the real-time scenario.

Related papers

Novel Human Machine Interface via Robust Hand Gesture Recognition System using Channel Pruned YOLOv5s Model [4.0194015554916644]
This paper develops an efficient hand gesture detection and classification model using a channel-pruned YOLOv5s model. Our proposed method paves the way for deploying a pruned YOLOv5s model for a real-time gesture-command-based HCI. The average detection speed of our proposed system has reached more than 60 frames per second (fps) in real-time.
arXiv Detail & Related papers (2024-07-02T18:10:20Z)
MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition [49.52436478739151]
Large-scale pre-trained models have demonstrated impressive performance in vision and language tasks within open-world scenarios. Recent methods utilize language-image pre-training to realize zero-shot 3D shape recognition. This paper aims to improve the confidence with view selection and hierarchical prompts.
arXiv Detail & Related papers (2023-11-30T09:51:53Z)
Agile gesture recognition for capacitive sensing devices: adapting on-the-job [55.40855017016652]
We demonstrate a hand gesture recognition system that uses signals from capacitive sensors embedded into the etee hand controller. The controller generates real-time signals from each of the wearer five fingers. We use a machine learning technique to analyse the time series signals and identify three features that can represent 5 fingers within 500 ms.
arXiv Detail & Related papers (2023-05-12T17:24:02Z)
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research. In this study, we focus on transferring knowledge for video classification tasks. We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z)
Keypoint Message Passing for Video-based Person Re-Identification [106.41022426556776]
Video-based person re-identification (re-ID) is an important technique in visual surveillance systems which aims to match video snippets of people captured by different cameras. Existing methods are mostly based on convolutional neural networks (CNNs), whose building blocks either process local neighbor pixels at a time, or, when 3D convolutions are used to model temporal information, suffer from the misalignment problem caused by person movement. In this paper, we propose to overcome the limitations of normal convolutions with a human-oriented graph method. Specifically, features located at person joint keypoints are extracted and connected as a spatial-temporal graph
arXiv Detail & Related papers (2021-11-16T08:01:16Z)
On-device Real-time Hand Gesture Recognition [1.4658400971135652]
We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera. We use MediaPipe Hands as the basis of the hand skeleton tracker, improve the keypoint accuracy, and add the estimation of 3D keypoints in a world metric space.
arXiv Detail & Related papers (2021-10-29T18:33:25Z)
STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data. Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural Networks [71.09275975580009]
HandVoxNet++ is a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner. HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology. We combine the advantages of both representations by aligning the hand surface to the voxelized hand shape either with a new neural Graph-Convolutions-based Mesh Registration (GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach (NRGA++) which
arXiv Detail & Related papers (2021-07-02T17:59:54Z)
DriverMHG: A Multi-Modal Dataset for Dynamic Recognition of Driver Micro Hand Gestures and a Real-Time Recognition Framework [9.128828609564522]
Real-time recognition of dynamic micro hand gestures from video streams is challenging for in-vehicle scenarios. We propose a lightweight convolutional neural network (CNN) based architecture which operates online efficiently with a sliding window approach. Online recognition of gestures has been performed with 3D-MobileNetV2, which provided the best offline accuracy.
arXiv Detail & Related papers (2020-03-02T14:54:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.