HaGRID - HAnd Gesture Recognition Image Dataset
- URL: http://arxiv.org/abs/2206.08219v2
- Date: Thu, 18 Jan 2024 15:02:56 GMT
- Title: HaGRID - HAnd Gesture Recognition Image Dataset
- Authors: Alexander Kapitanov, Karina Kvanchiani, Alexander Nagaev, Roman
Kraynov, Andrei Makhliarchuk
- Abstract summary: This paper introduces an enormous dataset, HaGRID, to build a hand gesture recognition system concentrating on interaction with devices to manage them.
Although the gestures are static, they were picked up, especially for the ability to design several dynamic gestures.
The HaGRID contains 554,800 images and bounding box annotations with gesture labels to solve hand detection and gesture classification tasks.
- Score: 79.21033185563167
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper introduces an enormous dataset, HaGRID (HAnd Gesture Recognition
Image Dataset), to build a hand gesture recognition (HGR) system concentrating
on interaction with devices to manage them. That is why all 18 chosen gestures
are endowed with the semiotic function and can be interpreted as a specific
action. Although the gestures are static, they were picked up, especially for
the ability to design several dynamic gestures. It allows the trained model to
recognize not only static gestures such as "like" and "stop" but also "swipes"
and "drag and drop" dynamic gestures. The HaGRID contains 554,800 images and
bounding box annotations with gesture labels to solve hand detection and
gesture classification tasks. The low variability in context and subjects of
other datasets was the reason for creating the dataset without such
limitations. Utilizing crowdsourcing platforms allowed us to collect samples
recorded by 37,583 subjects in at least as many scenes with subject-to-camera
distances from 0.5 to 4 meters in various natural light conditions. The
influence of the diversity characteristics was assessed in ablation study
experiments. Also, we demonstrate the HaGRID ability to be used for pretraining
models in HGR tasks. The HaGRID and pretrained models are publicly available.
Related papers
- GeoGround: A Unified Large Vision-Language Model. for Remote Sensing Visual Grounding [31.01378033872341]
GeoGround is a novel framework that unifies support for HBB, OBB, and mask RS visual grounding tasks.
To support model training, we present refGeo, a large-scale RS visual instruction-following dataset containing 161k image-text pairs.
arXiv Detail & Related papers (2024-11-16T05:12:11Z) - x-RAGE: eXtended Reality -- Action & Gesture Events Dataset [5.068559907583171]
We present the first event-camera based egocentric gesture dataset for enabling neuromorphic, low-power solutions for XR-centric gesture recognition.
The dataset has been made available publicly at the following URL: https://gitlab.com/NVM_IITD_Research/xrage.
arXiv Detail & Related papers (2024-10-25T11:44:06Z) - Wearable Sensor-Based Few-Shot Continual Learning on Hand Gestures for Motor-Impaired Individuals via Latent Embedding Exploitation [6.782362178252351]
We introduce the Latent Embedding Exploitation (LEE) mechanism in our replay-based Few-Shot Continual Learning framework.
Our method produces a diversified latent feature space by leveraging a preserved latent embedding known as gesture prior knowledge.
Our method helps motor-impaired persons leverage wearable devices, and their unique styles of movement can be learned and applied.
arXiv Detail & Related papers (2024-05-14T21:20:27Z) - Pix2Gif: Motion-Guided Diffusion for GIF Generation [70.64240654310754]
We present Pix2Gif, a motion-guided diffusion model for image-to-GIF (video) generation.
We propose a new motion-guided warping module to spatially transform the features of the source image conditioned on the two types of prompts.
In preparation for the model training, we meticulously curated data by extracting coherent image frames from the TGIF video-caption dataset.
arXiv Detail & Related papers (2024-03-07T16:18:28Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Video-based Pose-Estimation Data as Source for Transfer Learning in
Human Activity Recognition [71.91734471596433]
Human Activity Recognition (HAR) using on-body devices identifies specific human actions in unconstrained environments.
Previous works demonstrated that transfer learning is a good strategy for addressing scenarios with scarce data.
This paper proposes using datasets intended for human-pose estimation as a source for transfer learning.
arXiv Detail & Related papers (2022-12-02T18:19:36Z) - SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild [62.450907796261646]
Recognition of hand gestures can be performed directly from the stream of hand skeletons estimated by software.
Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario.
This paper presents the results of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild contest.
arXiv Detail & Related papers (2021-06-21T10:57:49Z) - Hidden Footprints: Learning Contextual Walkability from 3D Human Trails [70.01257397390361]
Current datasets only tell you where people are, not where they could be.
We first augment the set of valid, labeled walkable regions by propagating person observations between images, utilizing 3D information to create what we call hidden footprints.
We devise a training strategy designed for such sparse labels, combining a class-balanced classification loss with a contextual adversarial loss.
arXiv Detail & Related papers (2020-08-19T23:19:08Z) - A Deep Learning Framework for Recognizing both Static and Dynamic
Gestures [0.8602553195689513]
We propose a unified framework that recognizes both static and dynamic gestures, using simple RGB vision (without depth sensing)
We employ a pose-driven spatial attention strategy, which guides our proposed Static and Dynamic gestures Network - StaDNet.
In a number of experiments, we show that the proposed approach surpasses the state-of-the-art results on the large-scale Chalearn 2016 dataset.
arXiv Detail & Related papers (2020-06-11T10:39:02Z) - IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand
Gesture Recognition [11.917058689674327]
We introduce a new benchmark dataset named IPN Hand with sufficient size, variety, and real-world elements able to train and evaluate deep neural networks.
This dataset contains more than 4,000 gesture samples and 800,000 RGB frames from 50 distinct subjects.
With our dataset, the performance of three 3D-CNN models is evaluated on the tasks of isolated and continuous real-time HGR.
arXiv Detail & Related papers (2020-04-20T08:52:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.