IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand
Gesture Recognition
- URL: http://arxiv.org/abs/2005.02134v2
- Date: Tue, 20 Oct 2020 14:50:42 GMT
- Title: IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand
Gesture Recognition
- Authors: Gibran Benitez-Garcia, Jesus Olivares-Mercado, Gabriel Sanchez-Perez,
and Keiji Yanai
- Abstract summary: We introduce a new benchmark dataset named IPN Hand with sufficient size, variety, and real-world elements able to train and evaluate deep neural networks.
This dataset contains more than 4,000 gesture samples and 800,000 RGB frames from 50 distinct subjects.
With our dataset, the performance of three 3D-CNN models is evaluated on the tasks of isolated and continuous real-time HGR.
- Score: 11.917058689674327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a new benchmark dataset named IPN Hand with
sufficient size, variety, and real-world elements able to train and evaluate
deep neural networks. This dataset contains more than 4,000 gesture samples and
800,000 RGB frames from 50 distinct subjects. We design 13 different static and
dynamic gestures focused on interaction with touchless screens. We especially
consider the scenario when continuous gestures are performed without transition
states, and when subjects perform natural movements with their hands as
non-gesture actions. Gestures were collected from about 30 diverse scenes, with
real-world variation in background and illumination. With our dataset, the
performance of three 3D-CNN models is evaluated on the tasks of isolated and
continuous real-time HGR. Furthermore, we analyze the possibility of increasing
the recognition accuracy by adding multiple modalities derived from RGB frames,
i.e., optical flow and semantic segmentation, while keeping the real-time
performance of the 3D-CNN model. Our empirical study also provides a comparison
with the publicly available nvGesture (NVIDIA) dataset. The experimental
results show that the state-of-the-art ResNext-101 model decreases about 30%
accuracy when using our real-world dataset, demonstrating that the IPN Hand
dataset can be used as a benchmark, and may help the community to step forward
in the continuous HGR. Our dataset and pre-trained models used in the
evaluation are publicly available at https://github.com/GibranBenitez/IPN-hand.
Related papers
- WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild [53.288327629960364]
We present a data-driven pipeline for efficient multi-hand reconstruction in the wild.
The proposed pipeline is composed of two components: a real-time fully convolutional hand localization and a high-fidelity transformer-based 3D hand reconstruction model.
Our approach outperforms previous methods in both efficiency and accuracy on popular 2D and 3D benchmarks.
arXiv Detail & Related papers (2024-09-18T18:46:51Z) - No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance [68.18779562801762]
multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance.
Our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.
arXiv Detail & Related papers (2024-04-04T17:58:02Z) - HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance.
First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds.
Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z) - Temporal Graph Benchmark for Machine Learning on Temporal Graphs [54.52243310226456]
Temporal Graph Benchmark (TGB) is a collection of challenging and diverse benchmark datasets.
We benchmark each dataset and find that the performance of common models can vary drastically across datasets.
TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research.
arXiv Detail & Related papers (2023-07-03T13:58:20Z) - Explicit Context Integrated Recurrent Neural Network for Sensor Data
Applications [0.0]
Context Integrated RNN (CiRNN) enables integrating explicit contexts represented in the form of contextual features.
Experiments show an improvement of 39% and 87% respectively, over state-of-the-art models.
arXiv Detail & Related papers (2023-01-12T13:58:56Z) - Comparison of Data Representations and Machine Learning Architectures
for User Identification on Arbitrary Motion Sequences [8.967985264567217]
This paper compares different machine learning approaches to identify users based on arbitrary sequences of head and hand movements.
We publish all our code to allow and to provide baselines for future work.
The model correctly identifies any of the 34 subjects with an accuracy of 100% within 150 seconds.
arXiv Detail & Related papers (2022-10-02T14:12:10Z) - A Novel Hand Gesture Detection and Recognition system based on
ensemble-based Convolutional Neural Network [3.5665681694253903]
Detection of hand portion has become a challenging task in computer vision and pattern recognition communities.
Deep learning algorithm like convolutional neural network (CNN) architecture has become a very popular choice for classification tasks.
In this paper, an ensemble of CNN-based approaches is presented to overcome some problems like high variance during prediction, overfitting problem and also prediction errors.
arXiv Detail & Related papers (2022-02-25T06:46:58Z) - Decoding ECoG signal into 3D hand translation using deep learning [3.20238141000059]
Motor brain-computer interfaces (BCIs) are promising technology that may enable motor-impaired people to interact with their environment.
Most ECoG signal decoders used to predict continuous hand movements are linear models.
Deep learning models, which are state-of-the-art in many problems, could be a solution to better capture this relationship.
arXiv Detail & Related papers (2021-10-05T15:41:04Z) - HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural
Networks [71.09275975580009]
HandVoxNet++ is a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner.
HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology.
We combine the advantages of both representations by aligning the hand surface to the voxelized hand shape either with a new neural Graph-Convolutions-based Mesh Registration (GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach (NRGA++) which
arXiv Detail & Related papers (2021-07-02T17:59:54Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Writing in The Air: Unconstrained Text Recognition from Finger Movement
Using Spatio-Temporal Convolution [3.3502165500990824]
In this paper, we introduce a new benchmark dataset for the challenging writing in the air (WiTA) task.
WiTA implements an intuitive and natural writing method with finger movement for human-computer interaction.
Our dataset consists of five sub-datasets in two languages (Korean and English) and amounts to 209,926 instances from 122 participants.
arXiv Detail & Related papers (2021-04-19T02:37:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.