Spatial Temporal Graph Attention Network for Skeleton-Based Action
Recognition
- URL: http://arxiv.org/abs/2208.08599v1
- Date: Thu, 18 Aug 2022 02:34:46 GMT
- Title: Spatial Temporal Graph Attention Network for Skeleton-Based Action
Recognition
- Authors: Lianyu Hu, Shenglan Liu, Wei Feng
- Abstract summary: It's common for current methods in skeleton-based action recognition to mainly consider capturing long-term temporal dependencies.
We propose a general framework, coined as STGAT, to model cross-spacetime information flow.
STGAT achieves state-of-the-art performance on three large-scale datasets.
- Score: 10.60209288486904
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: It's common for current methods in skeleton-based action recognition to
mainly consider capturing long-term temporal dependencies as skeleton sequences
are typically long (>128 frames), which forms a challenging problem for
previous approaches. In such conditions, short-term dependencies are few
formally considered, which are critical for classifying similar actions. Most
current approaches are consisted of interleaving spatial-only modules and
temporal-only modules, where direct information flow among joints in adjacent
frames are hindered, thus inferior to capture short-term motion and distinguish
similar action pairs. To handle this limitation, we propose a general
framework, coined as STGAT, to model cross-spacetime information flow. It
equips the spatial-only modules with spatial-temporal modeling for regional
perception. While STGAT is theoretically effective for spatial-temporal
modeling, we propose three simple modules to reduce local spatial-temporal
feature redundancy and further release the potential of STGAT, which (1) narrow
the scope of self-attention mechanism, (2) dynamically weight joints along
temporal dimension, and (3) separate subtle motion from static features,
respectively. As a robust feature extractor, STGAT generalizes better upon
classifying similar actions than previous methods, witnessed by both
qualitative and quantitative results. STGAT achieves state-of-the-art
performance on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and
Kinetics Skeleton 400. Code is released.
Related papers
- Dynamic Spatial-Temporal Aggregation for Skeleton-Aware Sign Language Recognition [10.048809585477555]
Skeleton-aware sign language recognition has gained popularity due to its ability to remain unaffected by background information.
Current methods utilize spatial graph modules and temporal modules to capture spatial and temporal features, respectively.
We propose a new spatial architecture consisting of two concurrent branches, which build input-sensitive joint relationships.
We then propose a new temporal module to model multi-scale temporal information to capture complex human dynamics.
arXiv Detail & Related papers (2024-03-19T07:42:57Z) - A Decoupled Spatio-Temporal Framework for Skeleton-based Action
Segmentation [89.86345494602642]
Existing methods are limited in weak-temporal modeling capability.
We propose a Decoupled Scoupled Framework (DeST) to address the issues.
DeST significantly outperforms current state-of-the-art methods with less computational complexity.
arXiv Detail & Related papers (2023-12-10T09:11:39Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - Multi-Scale Spatial Temporal Graph Convolutional Network for
Skeleton-Based Action Recognition [13.15374205970988]
We present a multi-scale spatial graph convolution (MS-GC) module and a multi-scale temporal graph convolution (MT-GC) module.
The MS-GC and MT-GC modules decompose the corresponding local graph convolution into a set of sub-graph convolutions, forming a hierarchical residual architecture.
We propose a multi-scale spatial temporal graph convolutional network (MST-GCN), which stacks multiple blocks to learn effective motion representations for action recognition.
arXiv Detail & Related papers (2022-06-27T03:17:33Z) - SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition.
We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors.
Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z) - Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based
Motion Recognition [62.46544616232238]
Previous motion recognition methods have achieved promising performance through the tightly coupled multi-temporal representation.
We propose to decouple and recouple caused caused representation for RGB-D-based motion recognition.
arXiv Detail & Related papers (2021-12-16T18:59:47Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.