Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN
- URL: http://arxiv.org/abs/2406.15003v2
- Date: Sun, 06 Oct 2024 04:06:44 GMT
- Title: Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN
- Authors: Oluwaleke Yusuf, Maki Habib, Mohamed Moustafa,
- Abstract summary: Hand Gesture Recognition (HGR) enables intuitive human-computer interactions in various real-world contexts.
Existing frameworks often struggle to meet the real-time requirements essential for practical HGR applications.
This study introduces a robust, skeleton-based framework for dynamic HGR that simplifies the recognition of dynamic hand gestures into a static image task.
- Score: 0.0
- License:
- Abstract: Hand Gesture Recognition (HGR) enables intuitive human-computer interactions in various real-world contexts. However, existing frameworks often struggle to meet the real-time requirements essential for practical HGR applications. This study introduces a robust, skeleton-based framework for dynamic HGR that simplifies the recognition of dynamic hand gestures into a static image classification task, effectively reducing both hardware and computational demands. Our framework utilizes a data-level fusion technique to encode 3D skeleton data from dynamic gestures into static RGB spatiotemporal images. It incorporates a specialized end-to-end Ensemble Tuner (e2eET) Multi-Stream CNN architecture that optimizes the semantic connections between data representations while minimizing computational needs. Tested across five benchmark datasets (SHREC'17, DHG-14/28, FPHA, LMDHG, and CNR), the framework showed competitive performance with the state-of-the-art. Its capability to support real-time HGR applications was also demonstrated through deployment on standard consumer PC hardware, showcasing low latency and minimal resource usage in real-world settings. The successful deployment of this framework underscores its potential to enhance real-time applications in fields such as virtual/augmented reality, ambient intelligence, and assistive technologies, providing a scalable and efficient solution for dynamic gesture recognition.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking [51.28485682954006]
We propose a pure Mamba-based framework (MambaVT) to fully exploit intrinsic-temporal contextual modeling for robust visible-thermal tracking.
Specifically, we devise the long-range cross-frame integration component to globally adapt to target appearance variations.
Experiments show the significant potential of vision Mamba for RGB-T tracking, with MambaVT achieving state-of-the-art performance on four mainstream benchmarks.
arXiv Detail & Related papers (2024-08-15T02:29:00Z) - Deep-Graph-Sprints: Accelerated Representation Learning in Continuous-Time Dynamic Graphs [4.372841335228306]
Continuous-time dynamic graphs (CTDGs) are essential for modeling interconnected, evolving systems.
Deep-Graph-Sprints (DGS) is a novel deep learning architecture designed for efficient representation learning on CTDGs with low-latency inference requirements.
arXiv Detail & Related papers (2024-07-10T14:44:25Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - VoxNeRF: Bridging Voxel Representation and Neural Radiance Fields for
Enhanced Indoor View Synthesis [51.49008959209671]
We introduce VoxNeRF, a novel approach that leverages volumetric representations to enhance the quality and efficiency of indoor view synthesis.
We employ multi-resolution hash grids to adaptively capture spatial features, effectively managing occlusions and the intricate geometry of indoor scenes.
We validate our approach against three public indoor datasets and demonstrate that VoxNeRF outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-11-09T11:32:49Z) - A Multi-label Classification Approach to Increase Expressivity of
EMG-based Gesture Recognition [4.701158597171363]
The aim of this study is to efficiently increase the expressivity of surface electromyography-based (sEMG) gesture recognition systems.
We use a problem transformation approach, in which actions were subset into two biomechanically independent components.
arXiv Detail & Related papers (2023-09-13T20:21:41Z) - SynthoGestures: A Novel Framework for Synthetic Dynamic Hand Gesture Generation for Driving Scenarios [17.94374027261511]
We propose a framework to synthesize realistic hand gestures using Unreal Engine.
Our framework offers customization options and reduces the risk of overfitting.
By saving time and effort in the creation of the data set, our tool accelerates the development of gesture recognition systems for automotive applications.
arXiv Detail & Related papers (2023-09-08T16:32:56Z) - Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output.
Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion.
We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
arXiv Detail & Related papers (2023-05-29T15:03:23Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - DriverMHG: A Multi-Modal Dataset for Dynamic Recognition of Driver Micro
Hand Gestures and a Real-Time Recognition Framework [9.128828609564522]
Real-time recognition of dynamic micro hand gestures from video streams is challenging for in-vehicle scenarios.
We propose a lightweight convolutional neural network (CNN) based architecture which operates online efficiently with a sliding window approach.
Online recognition of gestures has been performed with 3D-MobileNetV2, which provided the best offline accuracy.
arXiv Detail & Related papers (2020-03-02T14:54:19Z) - LE-HGR: A Lightweight and Efficient RGB-based Online Gesture Recognition
Network for Embedded AR Devices [8.509059894058947]
We propose a lightweight and computationally efficient HGR framework, namely LE-HGR, to enable real-time gesture recognition on embedded devices with low computing power.
We show that the proposed method is of high accuracy and robustness, which is able to reach high-end performance in a variety of complicated interaction environments.
arXiv Detail & Related papers (2020-01-16T05:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.