Multi-Scale Spatial Temporal Graph Convolutional Network for
Skeleton-Based Action Recognition
- URL: http://arxiv.org/abs/2206.13028v1
- Date: Mon, 27 Jun 2022 03:17:33 GMT
- Title: Multi-Scale Spatial Temporal Graph Convolutional Network for
Skeleton-Based Action Recognition
- Authors: Zhan Chen, Sicheng Li, Bing Yang, Qinghan Li, Hong Liu
- Abstract summary: We present a multi-scale spatial graph convolution (MS-GC) module and a multi-scale temporal graph convolution (MT-GC) module.
The MS-GC and MT-GC modules decompose the corresponding local graph convolution into a set of sub-graph convolutions, forming a hierarchical residual architecture.
We propose a multi-scale spatial temporal graph convolutional network (MST-GCN), which stacks multiple blocks to learn effective motion representations for action recognition.
- Score: 13.15374205970988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph convolutional networks have been widely used for skeleton-based action
recognition due to their excellent modeling ability of non-Euclidean data. As
the graph convolution is a local operation, it can only utilize the short-range
joint dependencies and short-term trajectory but fails to directly model the
distant joints relations and long-range temporal information that are vital to
distinguishing various actions. To solve this problem, we present a multi-scale
spatial graph convolution (MS-GC) module and a multi-scale temporal graph
convolution (MT-GC) module to enrich the receptive field of the model in
spatial and temporal dimensions. Concretely, the MS-GC and MT-GC modules
decompose the corresponding local graph convolution into a set of sub-graph
convolution, forming a hierarchical residual architecture. Without introducing
additional parameters, the features will be processed with a series of
sub-graph convolutions, and each node could complete multiple spatial and
temporal aggregations with its neighborhoods. The final equivalent receptive
field is accordingly enlarged, which is capable of capturing both short- and
long-range dependencies in spatial and temporal domains. By coupling these two
modules as a basic block, we further propose a multi-scale spatial temporal
graph convolutional network (MST-GCN), which stacks multiple blocks to learn
effective motion representations for action recognition. The proposed MST-GCN
achieves remarkable performance on three challenging benchmark datasets, NTU
RGB+D, NTU-120 RGB+D and Kinetics-Skeleton, for skeleton-based action
recognition.
Related papers
- Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition [0.0]
In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN.
We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node.
arXiv Detail & Related papers (2024-04-03T10:25:45Z) - Mending of Spatio-Temporal Dependencies in Block Adjacency Matrix [3.529869282529924]
We propose a novel end-to-end learning architecture designed to mend the temporal dependencies, resulting in a well-connected graph.
Our methodology demonstrates superior performance on benchmark datasets, such as SurgVisDom and C2D2.
arXiv Detail & Related papers (2023-10-04T06:42:33Z) - Leveraging Spatio-Temporal Dependency for Skeleton-Based Action
Recognition [9.999149887494646]
Skeleton-based action recognition has attracted considerable attention due to its compact representation of the human body's skeletal sucrture.
Many recent methods have achieved remarkable performance using graph convolutional networks (GCNs) and convolutional neural networks (CNNs)
arXiv Detail & Related papers (2022-12-09T10:37:22Z) - SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition.
We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors.
Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Temporal Graph Modeling for Skeleton-based Action Recognition [25.788239844759246]
We propose a Temporal Enhanced Graph Convolutional Network (TE-GCN) to capture complex temporal dynamic.
The constructed temporal relation graph explicitly builds connections between semantically related temporal features.
Experiments are performed on two widely used large-scale datasets.
arXiv Detail & Related papers (2020-12-16T09:02:47Z) - Spatio-Temporal Inception Graph Convolutional Networks for
Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition.
Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.