A deep-learning--based multimodal depth-aware dynamic hand gesture
recognition system
- URL: http://arxiv.org/abs/2107.02543v1
- Date: Tue, 6 Jul 2021 11:18:53 GMT
- Title: A deep-learning--based multimodal depth-aware dynamic hand gesture
recognition system
- Authors: Hasan Mahmud, Mashrur Mahmud Morshed, Md. Kamrul Hasan
- Abstract summary: We focus on dynamic hand gesture (DHG) recognition using depth quantized image hand skeleton joint points.
In particular, we explore the effect of using depth-quantized features in CNN and Recurrent Neural Network (RNN) based multi-modal fusion networks.
- Score: 5.458813674116228
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Any spatio-temporal movement or reorientation of the hand, done with the
intention of conveying a specific meaning, can be considered as a hand gesture.
Inputs to hand gesture recognition systems can be in several forms, such as
depth images, monocular RGB, or skeleton joint points. We observe that raw
depth images possess low contrasts in the hand regions of interest (ROI). They
do not highlight important details to learn, such as finger bending information
(whether a finger is overlapping the palm, or another finger). Recently, in
deep-learning--based dynamic hand gesture recognition, researchers are tying to
fuse different input modalities (e.g. RGB or depth images and hand skeleton
joint points) to improve the recognition accuracy. In this paper, we focus on
dynamic hand gesture (DHG) recognition using depth quantized image features and
hand skeleton joint points. In particular, we explore the effect of using
depth-quantized features in Convolutional Neural Network (CNN) and Recurrent
Neural Network (RNN) based multi-modal fusion networks. We find that our method
improves existing results on the SHREC-DHG-14 dataset. Furthermore, using our
method, we show that it is possible to reduce the resolution of the input
images by more than four times and still obtain comparable or better accuracy
to that of the resolutions used in previous methods.
Related papers
- SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition [5.359837526794863]
Hand pose represents key information for action recognition in the egocentric perspective.
We propose to improve egocentric 3D hand pose estimation based on RGB frames only by using pseudo-depth images.
arXiv Detail & Related papers (2024-08-19T14:30:29Z) - Exploring Deep Learning Image Super-Resolution for Iris Recognition [50.43429968821899]
We propose the use of two deep learning single-image super-resolution approaches: Stacked Auto-Encoders (SAE) and Convolutional Neural Networks (CNN)
We validate the methods with a database of 1.872 near-infrared iris images with quality assessment and recognition experiments showing the superiority of deep learning approaches over the compared algorithms.
arXiv Detail & Related papers (2023-11-02T13:57:48Z) - HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands.
We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z) - Real-Time Hand Gesture Identification in Thermal Images [0.0]
Our system is capable of handling multiple hand regions in a frame and process it fast for real-time applications.
We collected a new thermal image data set with 10 gestures and reported an end-to-end hand gesture recognition accuracy of 97%.
arXiv Detail & Related papers (2023-03-04T05:02:35Z) - RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera.
In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN.
We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - Understanding the hand-gestures using Convolutional Neural Networks and
Generative Adversial Networks [0.0]
The system consists of three modules: real time hand tracking, training gesture and gesture recognition using Convolutional Neural Networks.
It has been tested to the vocabulary of 36 gestures including the alphabets and digits, and results effectiveness of the approach.
arXiv Detail & Related papers (2020-11-10T02:20:43Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z) - Rethinking of the Image Salient Object Detection: Object-level Semantic
Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter [62.26677215668959]
We propose a lightweight, weakly supervised deep network to coarsely locate semantically salient regions.
We then fuse multiple off-the-shelf deep models on these semantically salient regions as the pixel-wise saliency refinement.
Our method is simple yet effective, which is the first attempt to consider the salient object detection mainly as an object-level semantic re-ranking problem.
arXiv Detail & Related papers (2020-08-10T07:12:43Z) - FineHand: Learning Hand Shapes for American Sign Language Recognition [16.862375555609667]
We present an approach for effective learning of hand shape embeddings, which are discriminative for ASL gestures.
For hand shape recognition our method uses a mix of manually labelled hand shapes and high confidence predictions to train deep convolutional neural network (CNN)
We will demonstrate that higher quality hand shape models can significantly improve the accuracy of final video gesture classification.
arXiv Detail & Related papers (2020-03-04T23:32:08Z) - 3D dynamic hand gestures recognition using the Leap Motion sensor and
convolutional neural networks [0.0]
We present a method for the recognition of a set of non-static gestures acquired through the Leap Motion sensor.
The acquired gesture information is converted in color images, where the variation of hand joint positions during the gesture are projected on a plane.
The classification of the gestures is performed using a deep Convolutional Neural Network (CNN)
arXiv Detail & Related papers (2020-03-03T11:05:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.