Two-stream Fusion Model for Dynamic Hand Gesture Recognition using
3D-CNN and 2D-CNN Optical Flow guided Motion Template
- URL: http://arxiv.org/abs/2007.08847v1
- Date: Fri, 17 Jul 2020 09:20:20 GMT
- Title: Two-stream Fusion Model for Dynamic Hand Gesture Recognition using
3D-CNN and 2D-CNN Optical Flow guided Motion Template
- Authors: Debajit Sarma, V. Kavyasree and M.K. Bhuyan
- Abstract summary: proper detection and tracking of the moving hand become challenging due to the varied shape and size of the hand.
This work basically proposes a two-stream fusion model for hand gesture recognition and a compact yet efficient motion template based on optical flow.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The use of hand gestures can be a useful tool for many applications in the
human-computer interaction community. In a broad range of areas hand gesture
techniques can be applied specifically in sign language recognition, robotic
surgery, etc. In the process of hand gesture recognition, proper detection, and
tracking of the moving hand become challenging due to the varied shape and size
of the hand. Here the objective is to track the movement of the hand
irrespective of the shape, size, and color of the hand. And, for this, a motion
template guided by optical flow (OFMT) is proposed. OFMT is a compact
representation of the motion information of a gesture encoded into a single
image. In the experimentation, different datasets using bare hand with an open
palm, and folded palm wearing green-glove are used, and in both cases, we could
generate the OFMT images with equal precision. Recently, deep network-based
techniques have shown impressive improvements as compared to conventional
hand-crafted feature-based techniques. Moreover, in the literature, it is seen
that the use of different streams with informative input data helps to increase
the performance in the recognition accuracy. This work basically proposes a
two-stream fusion model for hand gesture recognition and a compact yet
efficient motion template based on optical flow. Specifically, the two-stream
network consists of two layers: a 3D convolutional neural network (C3D) that
takes gesture videos as input and a 2D-CNN that takes OFMT images as input. C3D
has shown its efficiency in capturing spatio-temporal information of a video.
Whereas OFMT helps to eliminate irrelevant gestures providing additional motion
information. Though each stream can work independently, they are combined with
a fusion scheme to boost the recognition results. We have shown the efficiency
of the proposed two-stream network on two databases.
Related papers
- Hand Gesture Classification Based on Forearm Ultrasound Video Snippets Using 3D Convolutional Neural Networks [2.1301560294088318]
Forearm ultrasound offers detailed information about muscle morphology changes during hand movement which can be used to estimate hand gestures.
Previous work has focused on analyzing 2-Dimensional (2D) ultrasound image frames using techniques such as convolutional neural networks (CNNs)
This study uses 3D CNN based techniques to capture temporal patterns within ultrasound video segments for gesture recognition.
arXiv Detail & Related papers (2024-09-24T19:51:41Z) - HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions.
Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios.
We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z) - Realistic Human Motion Generation with Cross-Diffusion Models [30.854425772128568]
Cross Human Motion Diffusion Model (CrossDiff)
Method integrates 3D and 2D information using a shared transformer network within the training of the diffusion model.
CrossDiff effectively combines the strengths of both representations to generate more realistic motion sequences.
arXiv Detail & Related papers (2023-12-18T07:44:40Z) - Monocular 3D Reconstruction of Interacting Hands via Collision-Aware
Factorized Refinements [96.40125818594952]
We make the first attempt to reconstruct 3D interacting hands from monocular single RGB images.
Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions.
arXiv Detail & Related papers (2021-11-01T08:24:10Z) - A deep-learning--based multimodal depth-aware dynamic hand gesture
recognition system [5.458813674116228]
We focus on dynamic hand gesture (DHG) recognition using depth quantized image hand skeleton joint points.
In particular, we explore the effect of using depth-quantized features in CNN and Recurrent Neural Network (RNN) based multi-modal fusion networks.
arXiv Detail & Related papers (2021-07-06T11:18:53Z) - HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural
Networks [71.09275975580009]
HandVoxNet++ is a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner.
HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology.
We combine the advantages of both representations by aligning the hand surface to the voxelized hand shape either with a new neural Graph-Convolutions-based Mesh Registration (GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach (NRGA++) which
arXiv Detail & Related papers (2021-07-02T17:59:54Z) - RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera.
In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN.
We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z) - Understanding the hand-gestures using Convolutional Neural Networks and
Generative Adversial Networks [0.0]
The system consists of three modules: real time hand tracking, training gesture and gesture recognition using Convolutional Neural Networks.
It has been tested to the vocabulary of 36 gestures including the alphabets and digits, and results effectiveness of the approach.
arXiv Detail & Related papers (2020-11-10T02:20:43Z) - Residual Frames with Efficient Pseudo-3D CNN for Human Action
Recognition [10.185425416255294]
We propose to use residual frames as an alternative "lightweight" motion representation.
We also develop a new pseudo-3D convolution module which decouples 3D convolution into 2D and 1D convolution.
arXiv Detail & Related papers (2020-08-03T17:40:17Z) - Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body
Dynamics [87.17505994436308]
We build upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings.
We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone.
Our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as input.
arXiv Detail & Related papers (2020-07-23T22:58:15Z) - Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data [77.34069717612493]
We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps.
This is enabled by a new learning based architecture designed such that it can make use of all the sources of available hand training data.
It features a 3D hand joint detection module and an inverse kinematics module which regresses not only 3D joint positions but also maps them to joint rotations in a single feed-forward pass.
arXiv Detail & Related papers (2020-03-21T03:51:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.