Leveraging Spatio-Temporal Dependency for Skeleton-Based Action
Recognition
- URL: http://arxiv.org/abs/2212.04761v2
- Date: Wed, 19 Jul 2023 02:20:18 GMT
- Title: Leveraging Spatio-Temporal Dependency for Skeleton-Based Action
Recognition
- Authors: Jungho Lee, Minhyeok Lee, Suhwan Cho, Sungmin Woo, Sungjun Jang, and
Sangyoun Lee
- Abstract summary: Skeleton-based action recognition has attracted considerable attention due to its compact representation of the human body's skeletal sucrture.
Many recent methods have achieved remarkable performance using graph convolutional networks (GCNs) and convolutional neural networks (CNNs)
- Score: 9.999149887494646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Skeleton-based action recognition has attracted considerable attention due to
its compact representation of the human body's skeletal sructure. Many recent
methods have achieved remarkable performance using graph convolutional networks
(GCNs) and convolutional neural networks (CNNs), which extract spatial and
temporal features, respectively. Although spatial and temporal dependencies in
the human skeleton have been explored separately, spatio-temporal dependency is
rarely considered. In this paper, we propose the Spatio-Temporal Curve Network
(STC-Net) to effectively leverage the spatio-temporal dependency of the human
skeleton. Our proposed network consists of two novel elements: 1) The
Spatio-Temporal Curve (STC) module; and 2) Dilated Kernels for Graph
Convolution (DK-GC). The STC module dynamically adjusts the receptive field by
identifying meaningful node connections between every adjacent frame and
generating spatio-temporal curves based on the identified node connections,
providing an adaptive spatio-temporal coverage. In addition, we propose DK-GC
to consider long-range dependencies, which results in a large receptive field
without any additional parameters by applying an extended kernel to the given
adjacency matrices of the graph. Our STC-Net combines these two modules and
achieves state-of-the-art performance on four skeleton-based action recognition
benchmarks.
Related papers
- Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition [0.0]
In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN.
We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node.
arXiv Detail & Related papers (2024-04-03T10:25:45Z) - Mending of Spatio-Temporal Dependencies in Block Adjacency Matrix [3.529869282529924]
We propose a novel end-to-end learning architecture designed to mend the temporal dependencies, resulting in a well-connected graph.
Our methodology demonstrates superior performance on benchmark datasets, such as SurgVisDom and C2D2.
arXiv Detail & Related papers (2023-10-04T06:42:33Z) - DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action
Recognition [77.87404524458809]
We propose a new framework for skeleton-based action recognition, namely Dynamic Group Spatio-Temporal GCN (DG-STGCN)
It consists of two modules, DG-GCN and DG-TCN, respectively, for spatial and temporal modeling.
DG-STGCN consistently outperforms state-of-the-art methods, often by a notable margin.
arXiv Detail & Related papers (2022-10-12T03:17:37Z) - Multi-Scale Spatial Temporal Graph Convolutional Network for
Skeleton-Based Action Recognition [13.15374205970988]
We present a multi-scale spatial graph convolution (MS-GC) module and a multi-scale temporal graph convolution (MT-GC) module.
The MS-GC and MT-GC modules decompose the corresponding local graph convolution into a set of sub-graph convolutions, forming a hierarchical residual architecture.
We propose a multi-scale spatial temporal graph convolutional network (MST-GCN), which stacks multiple blocks to learn effective motion representations for action recognition.
arXiv Detail & Related papers (2022-06-27T03:17:33Z) - Skeleton-based Action Recognition via Temporal-Channel Aggregation [5.620303498964992]
We propose a Temporal-Channel Aggregation Graph Conal Networks (TCA-CN) to learn spatial and temporal topologies.
In addition, we extract multi-scale skeletal temporal modeling and fuse them with priori skeletal knowledge with an attention mechanism.
arXiv Detail & Related papers (2022-05-31T16:28:30Z) - Spatio-Temporal Joint Graph Convolutional Networks for Traffic
Forecasting [75.10017445699532]
Recent have shifted their focus towards formulating traffic forecasting as atemporal graph modeling problem.
We propose a novel approach for accurate traffic forecasting on road networks over multiple future time steps.
arXiv Detail & Related papers (2021-11-25T08:45:14Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Temporal Graph Modeling for Skeleton-based Action Recognition [25.788239844759246]
We propose a Temporal Enhanced Graph Convolutional Network (TE-GCN) to capture complex temporal dynamic.
The constructed temporal relation graph explicitly builds connections between semantically related temporal features.
Experiments are performed on two widely used large-scale datasets.
arXiv Detail & Related papers (2020-12-16T09:02:47Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - On the spatial attention in Spatio-Temporal Graph Convolutional Networks
for skeleton-based human action recognition [97.14064057840089]
Graphal networks (GCNs) promising performance in skeleton-based human action recognition by modeling a sequence of skeletons as a graph.
Most of the recently proposed G-temporal-based methods improve the performance by learning the graph structure at each layer of the network.
arXiv Detail & Related papers (2020-11-07T19:03:04Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.