A Two-stream Neural Network for Pose-based Hand Gesture Recognition
- URL: http://arxiv.org/abs/2101.08926v1
- Date: Fri, 22 Jan 2021 03:22:26 GMT
- Title: A Two-stream Neural Network for Pose-based Hand Gesture Recognition
- Authors: Chuankun Li, Shuai Li, Yanbo Gao, Xiang Zhang, Wanqing Li
- Abstract summary: Pose based hand gesture recognition has been widely studied in the recent years.
This paper proposes a two-stream neural network with one stream being a self-attention based graph convolutional network (SAGCN)
The residual-connection enhanced Bi-IndRNN extends an IndRNN with the capability of bidirectional processing for temporal modelling.
- Score: 23.50938160992517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pose based hand gesture recognition has been widely studied in the recent
years. Compared with full body action recognition, hand gesture involves joints
that are more spatially closely distributed with stronger collaboration. This
nature requires a different approach from action recognition to capturing the
complex spatial features. Many gesture categories, such as "Grab" and "Pinch",
have very similar motion or temporal patterns posing a challenge on temporal
processing. To address these challenges, this paper proposes a two-stream
neural network with one stream being a self-attention based graph convolutional
network (SAGCN) extracting the short-term temporal information and hierarchical
spatial information, and the other being a residual-connection enhanced
bidirectional Independently Recurrent Neural Network (RBi-IndRNN) for
extracting long-term temporal information. The self-attention based graph
convolutional network has a dynamic self-attention mechanism to adaptively
exploit the relationships of all hand joints in addition to the fixed topology
and local feature extraction in the GCN. On the other hand, the
residual-connection enhanced Bi-IndRNN extends an IndRNN with the capability of
bidirectional processing for temporal modelling. The two streams are fused
together for recognition. The Dynamic Hand Gesture dataset and First-Person
Hand Action dataset are used to validate its effectiveness, and our method
achieves state-of-the-art performance.
Related papers
- Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition [0.0]
In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN.
We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node.
arXiv Detail & Related papers (2024-04-03T10:25:45Z) - DD-GCN: Directed Diffusion Graph Convolutional Network for
Skeleton-based Human Action Recognition [10.115283931959855]
Graphal Networks (GCNs) have been widely used in skeleton-based human action recognition.
In this paper, we construct directed diffusion for action modeling and introduce the activity partition strategy.
We also present to embed synchronized-temporal-temporal semantics.
arXiv Detail & Related papers (2023-08-24T01:53:59Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based
Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification.
We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions.
We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z) - HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based
Gesture Recognition [73.64451471862613]
We propose an efficient hierarchical self-attention network (HAN) for skeleton-based gesture recognition.
Joint self-attention module is used to capture spatial features of fingers, the finger self-attention module is designed to aggregate features of the whole hand.
Experiments show that our method achieves competitive results on three gesture recognition datasets with much lower computational complexity.
arXiv Detail & Related papers (2021-06-25T02:15:53Z) - A Study On the Effects of Pre-processing On Spatio-temporal Action
Recognition Using Spiking Neural Networks Trained with STDP [0.0]
It is important to study the behavior of SNNs trained with unsupervised learning methods on video classification tasks.
This paper presents methods of transposing temporal information into a static format, and then transforming the visual information into spikes using latency coding.
We show the effect of the similarity in the shape and speed of certain actions on action recognition with spiking neural networks.
arXiv Detail & Related papers (2021-05-31T07:07:48Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Temporal Graph Modeling for Skeleton-based Action Recognition [25.788239844759246]
We propose a Temporal Enhanced Graph Convolutional Network (TE-GCN) to capture complex temporal dynamic.
The constructed temporal relation graph explicitly builds connections between semantically related temporal features.
Experiments are performed on two widely used large-scale datasets.
arXiv Detail & Related papers (2020-12-16T09:02:47Z) - A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion
Compensation for Action Recognition in the EPIC-Kitchens Dataset [68.8204255655161]
Action recognition is one of the top-challenging research fields in computer vision.
ego-motion recorded sequences have become of important relevance.
The proposed method aims to cope with it by estimating this ego-motion or camera motion.
arXiv Detail & Related papers (2020-08-26T14:44:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.