Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition
- URL: http://arxiv.org/abs/2404.02624v1
- Date: Wed, 3 Apr 2024 10:25:45 GMT
- Title: Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition
- Authors: Ikuo Nakamura,
- Abstract summary: In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN.
We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Skeleton-based gesture recognition methods have achieved high success using Graph Convolutional Network (GCN). In addition, context-dependent adaptive topology as a neighborhood vertex information and attention mechanism leverages a model to better represent actions. In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN to effectively improve modeling ability to achieve state-of-the-art results on several datasets. We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node. These two are followed by multi-scale convolution network with dilations, which not only captures the long-range temporal dependencies of joints but also the long-range spatial dependencies (i.e., long-distance dependencies) of node temporal behaviors. They are combined into high-level spatial-temporal representations and output the predicted action with the softmax classifier.
Related papers
- Leveraging Spatio-Temporal Dependency for Skeleton-Based Action
Recognition [9.999149887494646]
Skeleton-based action recognition has attracted considerable attention due to its compact representation of the human body's skeletal sucrture.
Many recent methods have achieved remarkable performance using graph convolutional networks (GCNs) and convolutional neural networks (CNNs)
arXiv Detail & Related papers (2022-12-09T10:37:22Z) - Multi-Scale Spatial Temporal Graph Convolutional Network for
Skeleton-Based Action Recognition [13.15374205970988]
We present a multi-scale spatial graph convolution (MS-GC) module and a multi-scale temporal graph convolution (MT-GC) module.
The MS-GC and MT-GC modules decompose the corresponding local graph convolution into a set of sub-graph convolutions, forming a hierarchical residual architecture.
We propose a multi-scale spatial temporal graph convolutional network (MST-GCN), which stacks multiple blocks to learn effective motion representations for action recognition.
arXiv Detail & Related papers (2022-06-27T03:17:33Z) - SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition.
We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors.
Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z) - Spatio-Temporal Joint Graph Convolutional Networks for Traffic
Forecasting [75.10017445699532]
Recent have shifted their focus towards formulating traffic forecasting as atemporal graph modeling problem.
We propose a novel approach for accurate traffic forecasting on road networks over multiple future time steps.
arXiv Detail & Related papers (2021-11-25T08:45:14Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based
Gesture Recognition [73.64451471862613]
We propose an efficient hierarchical self-attention network (HAN) for skeleton-based gesture recognition.
Joint self-attention module is used to capture spatial features of fingers, the finger self-attention module is designed to aggregate features of the whole hand.
Experiments show that our method achieves competitive results on three gesture recognition datasets with much lower computational complexity.
arXiv Detail & Related papers (2021-06-25T02:15:53Z) - A Two-stream Neural Network for Pose-based Hand Gesture Recognition [23.50938160992517]
Pose based hand gesture recognition has been widely studied in the recent years.
This paper proposes a two-stream neural network with one stream being a self-attention based graph convolutional network (SAGCN)
The residual-connection enhanced Bi-IndRNN extends an IndRNN with the capability of bidirectional processing for temporal modelling.
arXiv Detail & Related papers (2021-01-22T03:22:26Z) - Temporal Graph Modeling for Skeleton-based Action Recognition [25.788239844759246]
We propose a Temporal Enhanced Graph Convolutional Network (TE-GCN) to capture complex temporal dynamic.
The constructed temporal relation graph explicitly builds connections between semantically related temporal features.
Experiments are performed on two widely used large-scale datasets.
arXiv Detail & Related papers (2020-12-16T09:02:47Z) - On the spatial attention in Spatio-Temporal Graph Convolutional Networks
for skeleton-based human action recognition [97.14064057840089]
Graphal networks (GCNs) promising performance in skeleton-based human action recognition by modeling a sequence of skeletons as a graph.
Most of the recently proposed G-temporal-based methods improve the performance by learning the graph structure at each layer of the network.
arXiv Detail & Related papers (2020-11-07T19:03:04Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.