A Deep Learning Framework for Recognizing both Static and Dynamic
Gestures
- URL: http://arxiv.org/abs/2006.06321v2
- Date: Wed, 17 Mar 2021 10:31:16 GMT
- Title: A Deep Learning Framework for Recognizing both Static and Dynamic
Gestures
- Authors: Osama Mazhar, Sofiane Ramdani, and Andrea Cherubini
- Abstract summary: We propose a unified framework that recognizes both static and dynamic gestures, using simple RGB vision (without depth sensing)
We employ a pose-driven spatial attention strategy, which guides our proposed Static and Dynamic gestures Network - StaDNet.
In a number of experiments, we show that the proposed approach surpasses the state-of-the-art results on the large-scale Chalearn 2016 dataset.
- Score: 0.8602553195689513
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Intuitive user interfaces are indispensable to interact with the human
centric smart environments. In this paper, we propose a unified framework that
recognizes both static and dynamic gestures, using simple RGB vision (without
depth sensing). This feature makes it suitable for inexpensive human-robot
interaction in social or industrial settings. We employ a pose-driven spatial
attention strategy, which guides our proposed Static and Dynamic gestures
Network - StaDNet. From the image of the human upper body, we estimate his/her
depth, along with the region-of-interest around his/her hands. The
Convolutional Neural Network in StaDNet is fine-tuned on a
background-substituted hand gestures dataset. It is utilized to detect 10
static gestures for each hand as well as to obtain the hand image-embeddings.
These are subsequently fused with the augmented pose vector and then passed to
the stacked Long Short-Term Memory blocks. Thus, human-centred frame-wise
information from the augmented pose vector and from the left/right hands
image-embeddings are aggregated in time to predict the dynamic gestures of the
performing person. In a number of experiments, we show that the proposed
approach surpasses the state-of-the-art results on the large-scale Chalearn
2016 dataset. Moreover, we transfer the knowledge learned through the proposed
methodology to the Praxis gestures dataset, and the obtained results also
outscore the state-of-the-art on this dataset.
Related papers
- Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands.
We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z) - Snapture -- A Novel Neural Architecture for Combined Static and Dynamic
Hand Gesture Recognition [19.320551882950706]
We propose a novel hybrid hand gesture recognition system.
Our architecture enables learning both static and dynamic gestures.
Our work contributes both to gesture recognition research and machine learning applications for non-verbal communication with robots.
arXiv Detail & Related papers (2022-05-28T11:12:38Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [62.265410865423]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural
Networks [71.09275975580009]
HandVoxNet++ is a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner.
HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology.
We combine the advantages of both representations by aligning the hand surface to the voxelized hand shape either with a new neural Graph-Convolutions-based Mesh Registration (GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach (NRGA++) which
arXiv Detail & Related papers (2021-07-02T17:59:54Z) - Learning Dynamics via Graph Neural Networks for Human Pose Estimation
and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame.
Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information.
Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z) - "What's This?" -- Learning to Segment Unknown Objects from Manipulation
Sequences [27.915309216800125]
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator.
We propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge.
Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data.
arXiv Detail & Related papers (2020-11-06T10:55:28Z) - Self-supervised Human Activity Recognition by Learning to Predict
Cross-Dimensional Motion [16.457778420360537]
We propose the use of self-supervised learning for human activity recognition with smartphone accelerometer data.
First, the representations of unlabeled input signals are learned by training a deep convolutional neural network to predict a segment of accelerometer values.
For this task, we add a number of fully connected layers to the end of the frozen network and train the added layers with labeled accelerometer signals to learn to classify human activities.
arXiv Detail & Related papers (2020-10-21T02:14:31Z) - Gesture Recognition from Skeleton Data for Intuitive Human-Machine
Interaction [0.6875312133832077]
We propose an approach for segmentation and classification of dynamic gestures based on a set of handcrafted features.
The method for gesture recognition applies a sliding window, which extracts information from both the spatial and temporal dimensions.
At the end, the recognized gestures are used to interact with a collaborative robot.
arXiv Detail & Related papers (2020-08-26T11:28:50Z) - 3D dynamic hand gestures recognition using the Leap Motion sensor and
convolutional neural networks [0.0]
We present a method for the recognition of a set of non-static gestures acquired through the Leap Motion sensor.
The acquired gesture information is converted in color images, where the variation of hand joint positions during the gesture are projected on a plane.
The classification of the gestures is performed using a deep Convolutional Neural Network (CNN)
arXiv Detail & Related papers (2020-03-03T11:05:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.