HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based
Gesture Recognition
- URL: http://arxiv.org/abs/2106.13391v1
- Date: Fri, 25 Jun 2021 02:15:53 GMT
- Title: HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based
Gesture Recognition
- Authors: Jianbo Liu, Ying Wang, Shiming Xiang, Chunhong Pan
- Abstract summary: We propose an efficient hierarchical self-attention network (HAN) for skeleton-based gesture recognition.
Joint self-attention module is used to capture spatial features of fingers, the finger self-attention module is designed to aggregate features of the whole hand.
Experiments show that our method achieves competitive results on three gesture recognition datasets with much lower computational complexity.
- Score: 73.64451471862613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Previous methods for skeleton-based gesture recognition mostly arrange the
skeleton sequence into a pseudo picture or spatial-temporal graph and apply
deep Convolutional Neural Network (CNN) or Graph Convolutional Network (GCN)
for feature extraction. Although achieving superior results, these methods have
inherent limitations in dynamically capturing local features of interactive
hand parts, and the computing efficiency still remains a serious issue. In this
work, the self-attention mechanism is introduced to alleviate this problem.
Considering the hierarchical structure of hand joints, we propose an efficient
hierarchical self-attention network (HAN) for skeleton-based gesture
recognition, which is based on pure self-attention without any CNN, RNN or GCN
operators. Specifically, the joint self-attention module is used to capture
spatial features of fingers, the finger self-attention module is designed to
aggregate features of the whole hand. In terms of temporal features, the
temporal self-attention module is utilized to capture the temporal dynamics of
the fingers and the entire hand. Finally, these features are fused by the
fusion self-attention module for gesture classification. Experiments show that
our method achieves competitive results on three gesture recognition datasets
with much lower computational complexity.
Related papers
- Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition [0.0]
In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN.
We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node.
arXiv Detail & Related papers (2024-04-03T10:25:45Z) - Part Aware Contrastive Learning for Self-Supervised Action Recognition [18.423841093299135]
This paper proposes an attention-based contrastive learning framework for skeleton representation learning, called SkeAttnCLR.
Our proposed SkeAttnCLR outperforms state-of-the-art methods on NTURGB+D, NTU120-RGB+D, and PKU-MMD datasets.
arXiv Detail & Related papers (2023-05-01T05:31:48Z) - Neural Eigenfunctions Are Structured Representation Learners [93.53445940137618]
This paper introduces a structured, adaptive-length deep representation called Neural Eigenmap.
We show that, when the eigenfunction is derived from positive relations in a data augmentation setup, applying NeuralEF results in an objective function.
We demonstrate using such representations as adaptive-length codes in image retrieval systems.
arXiv Detail & Related papers (2022-10-23T07:17:55Z) - Pose-Guided Graph Convolutional Networks for Skeleton-Based Action
Recognition [32.07659338674024]
Graph convolutional networks (GCNs) can model the human body skeletons as spatial and temporal graphs.
In this work, we propose pose-guided GCN (PG-GCN), a multi-modal framework for high-performance human action recognition.
The core idea of this module is to utilize a trainable graph to aggregate features from the skeleton stream with that of the pose stream, which leads to a network with more robust feature representation ability.
arXiv Detail & Related papers (2022-10-10T02:08:49Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition.
We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors.
Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - A Two-stream Neural Network for Pose-based Hand Gesture Recognition [23.50938160992517]
Pose based hand gesture recognition has been widely studied in the recent years.
This paper proposes a two-stream neural network with one stream being a self-attention based graph convolutional network (SAGCN)
The residual-connection enhanced Bi-IndRNN extends an IndRNN with the capability of bidirectional processing for temporal modelling.
arXiv Detail & Related papers (2021-01-22T03:22:26Z) - Spatio-Temporal Inception Graph Convolutional Networks for
Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition.
Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.