DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action
Recognition
- URL: http://arxiv.org/abs/2210.05895v1
- Date: Wed, 12 Oct 2022 03:17:37 GMT
- Title: DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action
Recognition
- Authors: Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin
- Abstract summary: We propose a new framework for skeleton-based action recognition, namely Dynamic Group Spatio-Temporal GCN (DG-STGCN)
It consists of two modules, DG-GCN and DG-TCN, respectively, for spatial and temporal modeling.
DG-STGCN consistently outperforms state-of-the-art methods, often by a notable margin.
- Score: 77.87404524458809
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph convolution networks (GCN) have been widely used in skeleton-based
action recognition. We note that existing GCN-based approaches primarily rely
on prescribed graphical structures (ie., a manually defined topology of
skeleton joints), which limits their flexibility to capture complicated
correlations between joints. To move beyond this limitation, we propose a new
framework for skeleton-based action recognition, namely Dynamic Group
Spatio-Temporal GCN (DG-STGCN). It consists of two modules, DG-GCN and DG-TCN,
respectively, for spatial and temporal modeling. In particular, DG-GCN uses
learned affinity matrices to capture dynamic graphical structures instead of
relying on a prescribed one, while DG-TCN performs group-wise temporal
convolutions with varying receptive fields and incorporates a dynamic
joint-skeleton fusion module for adaptive multi-level temporal modeling. On a
wide range of benchmarks, including NTURGB+D, Kinetics-Skeleton, BABEL, and
Toyota SmartHome, DG-STGCN consistently outperforms state-of-the-art methods,
often by a notable margin.
Related papers
- Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition [0.0]
In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN.
We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node.
arXiv Detail & Related papers (2024-04-03T10:25:45Z) - GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition [10.562869805151411]
Skeleton-based action recognition in videos is an important but challenging task in computer vision.
We propose a principled and parsimonious representation for sequential data by leveraging the Lie group structure.
Our proposed GCN-DevLSTM network consistently improves the strong GCN baseline models and achieves SOTA results with superior robustness in SAR tasks.
arXiv Detail & Related papers (2024-03-22T13:55:52Z) - DD-GCN: Directed Diffusion Graph Convolutional Network for
Skeleton-based Human Action Recognition [10.115283931959855]
Graphal Networks (GCNs) have been widely used in skeleton-based human action recognition.
In this paper, we construct directed diffusion for action modeling and introduce the activity partition strategy.
We also present to embed synchronized-temporal-temporal semantics.
arXiv Detail & Related papers (2023-08-24T01:53:59Z) - Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action
Recognition through Redefined Skeletal Topology Awareness [24.83836008577395]
Graph Convolutional Networks (GCNs) have long defined the state-of-the-art in skeleton-based action recognition.
They tend to optimize the adjacency matrix jointly with the model weights.
This process causes a gradual decay of bone connectivity data, culminating in a model indifferent to the very topology it sought to map.
We propose an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.
arXiv Detail & Related papers (2023-05-19T06:40:12Z) - Skeleton-based Action Recognition via Adaptive Cross-Form Learning [75.92422282666767]
Skeleton-based action recognition aims to project skeleton sequences to action categories, where sequences are derived from multiple forms of pre-detected points.
Existing methods tend to improve GCNs by leveraging multi-form skeletons due to their complementary cues.
We present Adaptive Cross-Form Learning (ACFL), which empowers well-designed GCNs to generate complementary representation from single-form skeletons.
arXiv Detail & Related papers (2022-06-30T07:40:03Z) - SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition.
We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors.
Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - On the spatial attention in Spatio-Temporal Graph Convolutional Networks
for skeleton-based human action recognition [97.14064057840089]
Graphal networks (GCNs) promising performance in skeleton-based human action recognition by modeling a sequence of skeletons as a graph.
Most of the recently proposed G-temporal-based methods improve the performance by learning the graph structure at each layer of the network.
arXiv Detail & Related papers (2020-11-07T19:03:04Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.