Skeleton-based Action Recognition via Temporal-Channel Aggregation
- URL: http://arxiv.org/abs/2205.15936v1
- Date: Tue, 31 May 2022 16:28:30 GMT
- Title: Skeleton-based Action Recognition via Temporal-Channel Aggregation
- Authors: Shengqin Wang, Yongji Zhang, Fenglin Wei, Kai Wang, Minghao Zhao, Yu
Jiang
- Abstract summary: We propose a Temporal-Channel Aggregation Graph Conal Networks (TCA-CN) to learn spatial and temporal topologies.
In addition, we extract multi-scale skeletal temporal modeling and fuse them with priori skeletal knowledge with an attention mechanism.
- Score: 5.620303498964992
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Skeleton-based action recognition methods are limited by the semantic
extraction of spatio-temporal skeletal maps. However, current methods have
difficulty in effectively combining features from both temporal and spatial
graph dimensions and tend to be thick on one side and thin on the other. In
this paper, we propose a Temporal-Channel Aggregation Graph Convolutional
Networks (TCA-GCN) to learn spatial and temporal topologies dynamically and
efficiently aggregate topological features in different temporal and channel
dimensions for skeleton-based action recognition. We use the Temporal
Aggregation module to learn temporal dimensional features and the Channel
Aggregation module to efficiently combine spatial dynamic topological features
learned using Channel-wise with temporal dynamic topological features. In
addition, we extract multi-scale skeletal features on temporal modeling and
fuse them with priori skeletal knowledge with an attention mechanism. Extensive
experiments show that our model results outperform state-of-the-art methods on
the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.
Related papers
- Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition [0.0]
In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN.
We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node.
arXiv Detail & Related papers (2024-04-03T10:25:45Z) - A Decoupled Spatio-Temporal Framework for Skeleton-based Action
Segmentation [89.86345494602642]
Existing methods are limited in weak-temporal modeling capability.
We propose a Decoupled Scoupled Framework (DeST) to address the issues.
DeST significantly outperforms current state-of-the-art methods with less computational complexity.
arXiv Detail & Related papers (2023-12-10T09:11:39Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton
Matching [77.6989219290789]
One-shot skeleton action recognition aims to learn a skeleton action recognition model with a single training sample.
This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching.
arXiv Detail & Related papers (2023-07-14T11:52:10Z) - Temporal-Channel Topology Enhanced Network for Skeleton-Based Action
Recognition [26.609509266693077]
We propose a novel CNN architecture, Temporal-Channel Topology Enhanced Network (TCTE-Net), to learn spatial and temporal topologies for skeleton-based action recognition.
TCTE-Net shows state-of-the-art performance compared to CNN-based methods and achieves superior performance compared to GCN-based methods.
arXiv Detail & Related papers (2023-02-25T03:09:07Z) - Dynamic Spatial-temporal Hypergraph Convolutional Network for
Skeleton-based Action Recognition [4.738525281379023]
Skeleton-based action recognition relies on the extraction of spatial-temporal topological information.
This paper proposes a dynamic spatial-temporal hypergraph convolutional network (DST-HCN) to capture spatial-temporal information for skeleton-based action recognition.
arXiv Detail & Related papers (2023-02-17T04:42:19Z) - Multi-Scale Spatial Temporal Graph Convolutional Network for
Skeleton-Based Action Recognition [13.15374205970988]
We present a multi-scale spatial graph convolution (MS-GC) module and a multi-scale temporal graph convolution (MT-GC) module.
The MS-GC and MT-GC modules decompose the corresponding local graph convolution into a set of sub-graph convolutions, forming a hierarchical residual architecture.
We propose a multi-scale spatial temporal graph convolutional network (MST-GCN), which stacks multiple blocks to learn effective motion representations for action recognition.
arXiv Detail & Related papers (2022-06-27T03:17:33Z) - Slow-Fast Visual Tempo Learning for Video-based Action Recognition [78.3820439082979]
Action visual tempo characterizes the dynamics and the temporal scale of an action.
Previous methods capture the visual tempo either by sampling raw videos with multiple rates, or by hierarchically sampling backbone features.
We propose a Temporal Correlation Module (TCM) to extract action visual tempo from low-level backbone features at single-layer remarkably.
arXiv Detail & Related papers (2022-02-24T14:20:04Z) - Temporal Graph Modeling for Skeleton-based Action Recognition [25.788239844759246]
We propose a Temporal Enhanced Graph Convolutional Network (TE-GCN) to capture complex temporal dynamic.
The constructed temporal relation graph explicitly builds connections between semantically related temporal features.
Experiments are performed on two widely used large-scale datasets.
arXiv Detail & Related papers (2020-12-16T09:02:47Z) - Learn to cycle: Time-consistent feature discovery for action recognition [83.43682368129072]
Generalizing over temporal variations is a prerequisite for effective action recognition in videos.
We introduce Squeeze Re Temporal Gates (SRTG), an approach that favors temporal activations with potential variations.
We show consistent improvement when using SRTPG blocks, with only a minimal increase in the number of GFLOs.
arXiv Detail & Related papers (2020-06-15T09:36:28Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.