Snapture -- A Novel Neural Architecture for Combined Static and Dynamic
Hand Gesture Recognition
- URL: http://arxiv.org/abs/2205.15862v2
- Date: Tue, 27 Feb 2024 10:59:33 GMT
- Title: Snapture -- A Novel Neural Architecture for Combined Static and Dynamic
Hand Gesture Recognition
- Authors: Hassan Ali, Doreen Jirak, Stefan Wermter
- Abstract summary: We propose a novel hybrid hand gesture recognition system.
Our architecture enables learning both static and dynamic gestures.
Our work contributes both to gesture recognition research and machine learning applications for non-verbal communication with robots.
- Score: 19.320551882950706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As robots are expected to get more involved in people's everyday lives,
frameworks that enable intuitive user interfaces are in demand. Hand gesture
recognition systems provide a natural way of communication and, thus, are an
integral part of seamless Human-Robot Interaction (HRI). Recent years have
witnessed an immense evolution of computational models powered by deep
learning. However, state-of-the-art models fall short in expanding across
different gesture domains, such as emblems and co-speech. In this paper, we
propose a novel hybrid hand gesture recognition system. Our architecture
enables learning both static and dynamic gestures: by capturing a so-called
"snapshot" of the gesture performance at its peak, we integrate the hand pose
along with the dynamic movement. Moreover, we present a method for analyzing
the motion profile of a gesture to uncover its dynamic characteristics and
which allows regulating a static channel based on the amount of motion. Our
evaluation demonstrates the superiority of our approach on two gesture
benchmarks compared to a CNNLSTM baseline. We also provide an analysis on a
gesture class basis that unveils the potential of our Snapture architecture for
performance improvements. Thanks to its modular implementation, our framework
allows the integration of other multimodal data like facial expressions and
head tracking, which are important cues in HRI scenarios, into one
architecture. Thus, our work contributes both to gesture recognition research
and machine learning applications for non-verbal communication with robots.
Related papers
- Recognition of Dynamic Hand Gestures in Long Distance using a Web-Camera for Robot Guidance [2.625826951636656]
We propose a model for recognizing dynamic gestures from a long distance of up to 20 meters.
The model integrates the SlowFast and Transformer architectures (SFT) to effectively process and classify complex gesture sequences captured in video frames.
arXiv Detail & Related papers (2024-06-18T09:17:28Z) - From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands.
We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures.
Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z) - Efficient Gesture Recognition for the Assistance of Visually Impaired
People using Multi-Head Neural Networks [5.883916678819684]
This paper proposes an interactive system for mobile devices controlled by hand gestures aimed at helping people with visual impairments.
This system allows the user to interact with the device by making simple static and dynamic hand gestures.
Each gesture triggers a different action in the system, such as object recognition, scene description or image scaling.
arXiv Detail & Related papers (2022-05-14T06:01:47Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild [62.450907796261646]
Recognition of hand gestures can be performed directly from the stream of hand skeletons estimated by software.
Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario.
This paper presents the results of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild contest.
arXiv Detail & Related papers (2021-06-21T10:57:49Z) - Gesture Similarity Analysis on Event Data Using a Hybrid Guided
Variational Auto Encoder [3.1148846501645084]
We propose a neuromorphic gesture analysis system which naturally declutters the background and analyzes gestures at high temporal resolution.
Our results show that the features learned by the VAE provides a similarity measure capable of clustering and pseudo labeling of new gestures.
arXiv Detail & Related papers (2021-03-31T23:58:34Z) - Learning Asynchronous and Sparse Human-Object Interaction in Videos [56.73059840294019]
Asynchronous-Sparse Interaction Graph Networks (ASSIGN) is able to automatically detect the structure of interaction events associated with entities in a video scene.
ASSIGN is tested on human-object interaction recognition and shows superior performance in segmenting and labeling of human sub-activities and object affordances from raw videos.
arXiv Detail & Related papers (2021-03-03T23:43:55Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Gesture Recognition from Skeleton Data for Intuitive Human-Machine
Interaction [0.6875312133832077]
We propose an approach for segmentation and classification of dynamic gestures based on a set of handcrafted features.
The method for gesture recognition applies a sliding window, which extracts information from both the spatial and temporal dimensions.
At the end, the recognized gestures are used to interact with a collaborative robot.
arXiv Detail & Related papers (2020-08-26T11:28:50Z) - Hierarchical Contrastive Motion Learning for Video Action Recognition [100.9807616796383]
We present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw video frames.
Our approach progressively learns a hierarchy of motion features that correspond to different abstraction levels in a network.
Our motion learning module is lightweight and flexible to be embedded into various backbone networks.
arXiv Detail & Related papers (2020-07-20T17:59:22Z) - A Deep Learning Framework for Recognizing both Static and Dynamic
Gestures [0.8602553195689513]
We propose a unified framework that recognizes both static and dynamic gestures, using simple RGB vision (without depth sensing)
We employ a pose-driven spatial attention strategy, which guides our proposed Static and Dynamic gestures Network - StaDNet.
In a number of experiments, we show that the proposed approach surpasses the state-of-the-art results on the large-scale Chalearn 2016 dataset.
arXiv Detail & Related papers (2020-06-11T10:39:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.