Related papers: Two-stream Fusion Model for Dynamic Hand Gesture Recognition using 3D-CNN and 2D-CNN Optical Flow guided Motion Template

Two-stream Fusion Model for Dynamic Hand Gesture Recognition using 3D-CNN and 2D-CNN Optical Flow guided Motion Template

URL: http://arxiv.org/abs/2007.08847v1
Date: Fri, 17 Jul 2020 09:20:20 GMT
Title: Two-stream Fusion Model for Dynamic Hand Gesture Recognition using 3D-CNN and 2D-CNN Optical Flow guided Motion Template
Authors: Debajit Sarma, V. Kavyasree and M.K. Bhuyan
Abstract summary: proper detection and tracking of the moving hand become challenging due to the varied shape and size of the hand. This work basically proposes a two-stream fusion model for hand gesture recognition and a compact yet efficient motion template based on optical flow.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: The use of hand gestures can be a useful tool for many applications in the human-computer interaction community. In a broad range of areas hand gesture techniques can be applied specifically in sign language recognition, robotic surgery, etc. In the process of hand gesture recognition, proper detection, and tracking of the moving hand become challenging due to the varied shape and size of the hand. Here the objective is to track the movement of the hand irrespective of the shape, size, and color of the hand. And, for this, a motion template guided by optical flow (OFMT) is proposed. OFMT is a compact representation of the motion information of a gesture encoded into a single image. In the experimentation, different datasets using bare hand with an open palm, and folded palm wearing green-glove are used, and in both cases, we could generate the OFMT images with equal precision. Recently, deep network-based techniques have shown impressive improvements as compared to conventional hand-crafted feature-based techniques. Moreover, in the literature, it is seen that the use of different streams with informative input data helps to increase the performance in the recognition accuracy. This work basically proposes a two-stream fusion model for hand gesture recognition and a compact yet efficient motion template based on optical flow. Specifically, the two-stream network consists of two layers: a 3D convolutional neural network (C3D) that takes gesture videos as input and a 2D-CNN that takes OFMT images as input. C3D has shown its efficiency in capturing spatio-temporal information of a video. Whereas OFMT helps to eliminate irrelevant gestures providing additional motion information. Though each stream can work independently, they are combined with a fusion scheme to boost the recognition results. We have shown the efficiency of the proposed two-stream network on two databases.

Related papers

Novel Diffusion Models for Multimodal 3D Hand Trajectory Prediction [26.204219108066454]
We propose novel diffusion models (MMTwin) for multimodal 3D hand trajectory prediction. MMTwin is designed to absorb multimodal information as input encompassing 2D RGB images, 3D point clouds, past hand waypoints, and text prompt. Two latent diffusion models, the egomotion diffusion and the HTP diffusion as twins, are integrated into MMTwin to predict camera egomotion and future hand trajectories concurrently.
arXiv Detail & Related papers (2025-04-10T01:29:50Z)
JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting [3.1143479095236892]
Jointly 3D Gaussian Hand (JGHand) is a novel joint-driven 3D Gaussian Splatting (3DGS)-based hand representation. We show that JGHand achieves real-time rendering speeds with enhanced quality, surpassing state-of-the-art methods.
arXiv Detail & Related papers (2025-01-31T12:33:24Z)
Hand Gesture Classification Based on Forearm Ultrasound Video Snippets Using 3D Convolutional Neural Networks [2.1301560294088318]
Forearm ultrasound offers detailed information about muscle morphology changes during hand movement which can be used to estimate hand gestures. Previous work has focused on analyzing 2-Dimensional (2D) ultrasound image frames using techniques such as convolutional neural networks (CNNs) This study uses 3D CNN based techniques to capture temporal patterns within ultrasound video segments for gesture recognition.
arXiv Detail & Related papers (2024-09-24T19:51:41Z)
HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions. Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios. We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z)
Realistic Human Motion Generation with Cross-Diffusion Models [30.854425772128568]
Cross Human Motion Diffusion Model (CrossDiff) Method integrates 3D and 2D information using a shared transformer network within the training of the diffusion model. CrossDiff effectively combines the strengths of both representations to generate more realistic motion sequences.
arXiv Detail & Related papers (2023-12-18T07:44:40Z)
Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements [96.40125818594952]
We make the first attempt to reconstruct 3D interacting hands from monocular single RGB images. Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions.
arXiv Detail & Related papers (2021-11-01T08:24:10Z)
A deep-learning--based multimodal depth-aware dynamic hand gesture recognition system [5.458813674116228]
We focus on dynamic hand gesture (DHG) recognition using depth quantized image hand skeleton joint points. In particular, we explore the effect of using depth-quantized features in CNN and Recurrent Neural Network (RNN) based multi-modal fusion networks.
arXiv Detail & Related papers (2021-07-06T11:18:53Z)
HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural Networks [71.09275975580009]
HandVoxNet++ is a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner. HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology. We combine the advantages of both representations by aligning the hand surface to the voxelized hand shape either with a new neural Graph-Convolutions-based Mesh Registration (GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach (NRGA++) which
arXiv Detail & Related papers (2021-07-02T17:59:54Z)
RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera. In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN. We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z)
Understanding the hand-gestures using Convolutional Neural Networks and Generative Adversial Networks [0.0]
The system consists of three modules: real time hand tracking, training gesture and gesture recognition using Convolutional Neural Networks. It has been tested to the vocabulary of 36 gestures including the alphabets and digits, and results effectiveness of the approach.
arXiv Detail & Related papers (2020-11-10T02:20:43Z)
Residual Frames with Efficient Pseudo-3D CNN for Human Action Recognition [10.185425416255294]
We propose to use residual frames as an alternative "lightweight" motion representation. We also develop a new pseudo-3D convolution module which decouples 3D convolution into 2D and 1D convolution.
arXiv Detail & Related papers (2020-08-03T17:40:17Z)
Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics [87.17505994436308]
We build upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings. We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone. Our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as input.
arXiv Detail & Related papers (2020-07-23T22:58:15Z)
Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data [77.34069717612493]
We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps. This is enabled by a new learning based architecture designed such that it can make use of all the sources of available hand training data. It features a 3D hand joint detection module and an inverse kinematics module which regresses not only 3D joint positions but also maps them to joint rotations in a single feed-forward pass.
arXiv Detail & Related papers (2020-03-21T03:51:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.