HaGRIDv2: 1M Images for Static and Dynamic Hand Gesture Recognition
- URL: http://arxiv.org/abs/2412.01508v1
- Date: Mon, 02 Dec 2024 14:01:44 GMT
- Title: HaGRIDv2: 1M Images for Static and Dynamic Hand Gesture Recognition
- Authors: Anton Nuzhdin, Alexander Nagaev, Alexander Sautin, Alexander Kapitanov, Karina Kvanchiani,
- Abstract summary: This paper proposes the second version of the widespread Hand Gesture Recognition dataset HaGRID -- HaGRIDv2.
We cover 15 new gestures with conversation and control functions, including two-handed ones.
We implement the dynamic gesture recognition algorithm and further enhanced it by adding three new groups of manipulation gestures.
- Score: 108.45001006078036
- License:
- Abstract: This paper proposes the second version of the widespread Hand Gesture Recognition dataset HaGRID -- HaGRIDv2. We cover 15 new gestures with conversation and control functions, including two-handed ones. Building on the foundational concepts proposed by HaGRID's authors, we implemented the dynamic gesture recognition algorithm and further enhanced it by adding three new groups of manipulation gestures. The ``no gesture" class was diversified by adding samples of natural hand movements, which allowed us to minimize false positives by 6 times. Combining extra samples with HaGRID, the received version outperforms the original in pre-training models for gesture-related tasks. Besides, we achieved the best generalization ability among gesture and hand detection datasets. In addition, the second version enhances the quality of the gestures generated by the diffusion model. HaGRIDv2, pre-trained models, and a dynamic gesture recognition algorithm are publicly available.
Related papers
- Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis [55.45253486141108]
RAG-Gesture is a diffusion-based gesture generation approach to produce semantically rich gestures.
We achieve this by using explicit domain knowledge to retrieve motions from a database of co-speech gestures.
We propose a control paradigm for guidance, that allows the users to modulate the amount of influence each retrieval insertion has over the generated sequence.
arXiv Detail & Related papers (2024-12-09T18:59:46Z) - Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars [47.61442517627826]
We propose to create animatable avatars for interacting hands with 3D Gaussian Splatting (GS) and single-image inputs.
Our proposed method is validated via extensive experiments on the large-scale InterHand2.6M dataset.
arXiv Detail & Related papers (2024-10-11T14:14:51Z) - An Advanced Deep Learning Based Three-Stream Hybrid Model for Dynamic Hand Gesture Recognition [1.7985212575295124]
We propose a novel three-stream hybrid model that combines RGB pixel and skeleton-based features to recognize hand gestures.
In the procedure, we preprocessed the dataset, including augmentation, to make rotation, translation, and scaling independent systems.
We mainly produced a powerful feature vector by taking advantage of the pixel-based deep learning feature and pos-estimation-based stacked deep learning feature.
arXiv Detail & Related papers (2024-08-15T09:05:00Z) - HaGRID - HAnd Gesture Recognition Image Dataset [79.21033185563167]
This paper introduces an enormous dataset, HaGRID, to build a hand gesture recognition system concentrating on interaction with devices to manage them.
Although the gestures are static, they were picked up, especially for the ability to design several dynamic gestures.
The HaGRID contains 554,800 images and bounding box annotations with gesture labels to solve hand detection and gesture classification tasks.
arXiv Detail & Related papers (2022-06-16T14:41:32Z) - HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural
Networks [71.09275975580009]
HandVoxNet++ is a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner.
HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology.
We combine the advantages of both representations by aligning the hand surface to the voxelized hand shape either with a new neural Graph-Convolutions-based Mesh Registration (GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach (NRGA++) which
arXiv Detail & Related papers (2021-07-02T17:59:54Z) - SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild [62.450907796261646]
Recognition of hand gestures can be performed directly from the stream of hand skeletons estimated by software.
Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario.
This paper presents the results of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild contest.
arXiv Detail & Related papers (2021-06-21T10:57:49Z) - A Prototype-Based Generalized Zero-Shot Learning Framework for Hand
Gesture Recognition [5.992264231643021]
We propose an end-to-end prototype-based framework for hand gesture recognition.
The first branch is a prototype-based detector that learns gesture representations.
The second branch is a zero-shot label predictor which takes the features of unseen classes as input and outputs predictions.
arXiv Detail & Related papers (2020-09-29T12:18:35Z) - FineHand: Learning Hand Shapes for American Sign Language Recognition [16.862375555609667]
We present an approach for effective learning of hand shape embeddings, which are discriminative for ASL gestures.
For hand shape recognition our method uses a mix of manually labelled hand shapes and high confidence predictions to train deep convolutional neural network (CNN)
We will demonstrate that higher quality hand shape models can significantly improve the accuracy of final video gesture classification.
arXiv Detail & Related papers (2020-03-04T23:32:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.