Related papers: Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN

Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN

URL: http://arxiv.org/abs/2406.15003v1
Date: Fri, 21 Jun 2024 09:30:59 GMT
Title: Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN
Authors: Oluwaleke Yusuf, Maki Habib, Mohamed Moustafa,
Abstract summary: This study focuses on Hand Gesture Recognition (HGR), which is vital for perceptual computing across various real-world contexts. We introduce an innovative HGR framework that combines data-level fusion and an Ensemble Tuner Multi-stream CNN architecture. This approach effectively encodes gesture information from the skeleton modality into RGB images, thereby minimizing noise while improving semantic gesture comprehension.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study focuses on Hand Gesture Recognition (HGR), which is vital for perceptual computing across various real-world contexts. The primary challenge in the HGR domain lies in dealing with the individual variations inherent in human hand morphology. To tackle this challenge, we introduce an innovative HGR framework that combines data-level fusion and an Ensemble Tuner Multi-stream CNN architecture. This approach effectively encodes spatiotemporal gesture information from the skeleton modality into RGB images, thereby minimizing noise while improving semantic gesture comprehension. Our framework operates in real-time, significantly reducing hardware requirements and computational complexity while maintaining competitive performance on benchmark datasets such as SHREC2017, DHG1428, FPHA, LMDHG and CNR. This improvement in HGR demonstrates robustness and paves the way for practical, real-time applications that leverage resource-limited devices for human-machine interaction and ambient intelligence.

Related papers

MHARFedLLM: Multimodal Human Activity Recognition Using Federated Large Language Model [26.112543245882076]
Human Activity Recognition (HAR) plays a vital role in applications such as fitness tracking, smart homes, and healthcare monitoring.<n>Traditional HAR systems often rely on single modalities, such as motion sensors or cameras, limiting robustness and accuracy in real-world environments.<n>This work presents FedTime-MAGNET, a novel multimodal federated learning framework that advances HAR by combining heterogeneous data sources.
arXiv Detail & Related papers (2025-08-03T10:05:06Z)
AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance? [49.64902130083662]
This paper proposes a comprehensive data augmentation framework that integrates geometric transformations, random variations, rotation, zooming and intensity-based transformations.<n>The proposed augmentation strategy is evaluated on three models: multi-stream e2eET, FPPR point cloud-based hand gesture recognition (HGR), and DD-Network.
arXiv Detail & Related papers (2025-06-08T16:43:05Z)
Online hand gesture recognition using Continual Graph Transformers [1.3927943269211591]
We propose a novel online recognition system designed for real-time skeleton sequence streaming. Our approach achieves state-of-the-art accuracy and significantly reduces false positive rates, making it a compelling solution for real-time applications. The proposed system can be seamlessly integrated into various domains, including human-robot collaboration and assistive technologies.
arXiv Detail & Related papers (2025-02-20T17:27:55Z)
ContextFormer: Redefining Efficiency in Semantic Segmentation [48.81126061219231]
Convolutional methods, although capturing local dependencies well, struggle with long-range relationships. Vision Transformers (ViTs) excel in global context capture but are hindered by high computational demands. We propose ContextFormer, a hybrid framework leveraging the strengths of CNNs and ViTs in the bottleneck to balance efficiency, accuracy, and robustness for real-time semantic segmentation.
arXiv Detail & Related papers (2025-01-31T16:11:04Z)
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining [67.87810796668981]
Information-Sensitive Cropping (ISC) and Self-Refining Dual Learning (SRDL) Iris achieves state-of-the-art performance across multiple benchmarks with only 850K GUI annotations. These improvements translate to significant gains in both web and OS agent downstream tasks.
arXiv Detail & Related papers (2024-12-13T18:40:10Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking [51.28485682954006]
We propose a pure Mamba-based framework (MambaVT) to fully exploit intrinsic-temporal contextual modeling for robust visible-thermal tracking. Specifically, we devise the long-range cross-frame integration component to globally adapt to target appearance variations. Experiments show the significant potential of vision Mamba for RGB-T tracking, with MambaVT achieving state-of-the-art performance on four mainstream benchmarks.
arXiv Detail & Related papers (2024-08-15T02:29:00Z)
Deep-Graph-Sprints: Accelerated Representation Learning in Continuous-Time Dynamic Graphs [4.372841335228306]
Continuous-time dynamic graphs (CTDGs) are essential for modeling interconnected, evolving systems. Deep-Graph-Sprints (DGS) is a novel deep learning architecture designed for efficient representation learning on CTDGs with low-latency inference requirements.
arXiv Detail & Related papers (2024-07-10T14:44:25Z)
DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation. Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details. Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z)
VoxNeRF: Bridging Voxel Representation and Neural Radiance Fields for Enhanced Indoor View Synthesis [51.49008959209671]
We introduce VoxNeRF, a novel approach that leverages volumetric representations to enhance the quality and efficiency of indoor view synthesis. We employ multi-resolution hash grids to adaptively capture spatial features, effectively managing occlusions and the intricate geometry of indoor scenes. We validate our approach against three public indoor datasets and demonstrate that VoxNeRF outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-11-09T11:32:49Z)
A Multi-label Classification Approach to Increase Expressivity of EMG-based Gesture Recognition [4.701158597171363]
The aim of this study is to efficiently increase the expressivity of surface electromyography-based (sEMG) gesture recognition systems. We use a problem transformation approach, in which actions were subset into two biomechanically independent components.
arXiv Detail & Related papers (2023-09-13T20:21:41Z)
SynthoGestures: A Novel Framework for Synthetic Dynamic Hand Gesture Generation for Driving Scenarios [17.94374027261511]
We propose a framework to synthesize realistic hand gestures using Unreal Engine. Our framework offers customization options and reduces the risk of overfitting. By saving time and effort in the creation of the data set, our tool accelerates the development of gesture recognition systems for automotive applications.
arXiv Detail & Related papers (2023-09-08T16:32:56Z)
Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output. Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion. We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
arXiv Detail & Related papers (2023-05-29T15:03:23Z)
Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain. In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden. Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
DriverMHG: A Multi-Modal Dataset for Dynamic Recognition of Driver Micro Hand Gestures and a Real-Time Recognition Framework [9.128828609564522]
Real-time recognition of dynamic micro hand gestures from video streams is challenging for in-vehicle scenarios. We propose a lightweight convolutional neural network (CNN) based architecture which operates online efficiently with a sliding window approach. Online recognition of gestures has been performed with 3D-MobileNetV2, which provided the best offline accuracy.
arXiv Detail & Related papers (2020-03-02T14:54:19Z)
LE-HGR: A Lightweight and Efficient RGB-based Online Gesture Recognition Network for Embedded AR Devices [8.509059894058947]
We propose a lightweight and computationally efficient HGR framework, namely LE-HGR, to enable real-time gesture recognition on embedded devices with low computing power. We show that the proposed method is of high accuracy and robustness, which is able to reach high-end performance in a variety of complicated interaction environments.
arXiv Detail & Related papers (2020-01-16T05:23:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.