Pose-Guided Graph Convolutional Networks for Skeleton-Based Action
Recognition
- URL: http://arxiv.org/abs/2210.06192v1
- Date: Mon, 10 Oct 2022 02:08:49 GMT
- Title: Pose-Guided Graph Convolutional Networks for Skeleton-Based Action
Recognition
- Authors: Han Chen and Yifan Jiang and Hanseok Ko
- Abstract summary: Graph convolutional networks (GCNs) can model the human body skeletons as spatial and temporal graphs.
In this work, we propose pose-guided GCN (PG-GCN), a multi-modal framework for high-performance human action recognition.
The core idea of this module is to utilize a trainable graph to aggregate features from the skeleton stream with that of the pose stream, which leads to a network with more robust feature representation ability.
- Score: 32.07659338674024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph convolutional networks (GCNs), which can model the human body skeletons
as spatial and temporal graphs, have shown remarkable potential in
skeleton-based action recognition. However, in the existing GCN-based methods,
graph-structured representation of the human skeleton makes it difficult to be
fused with other modalities, especially in the early stages. This may limit
their scalability and performance in action recognition tasks. In addition, the
pose information, which naturally contains informative and discriminative clues
for action recognition, is rarely explored together with skeleton data in
existing methods. In this work, we propose pose-guided GCN (PG-GCN), a
multi-modal framework for high-performance human action recognition. In
particular, a multi-stream network is constructed to simultaneously explore the
robust features from both the pose and skeleton data, while a dynamic attention
module is designed for early-stage feature fusion. The core idea of this module
is to utilize a trainable graph to aggregate features from the skeleton stream
with that of the pose stream, which leads to a network with more robust feature
representation ability. Extensive experiments show that the proposed PG-GCN can
achieve state-of-the-art performance on the NTU RGB+D 60 and NTU RGB+D 120
datasets.
Related papers
- SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition.
We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors.
Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based
Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification.
We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions.
We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z) - HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based
Gesture Recognition [73.64451471862613]
We propose an efficient hierarchical self-attention network (HAN) for skeleton-based gesture recognition.
Joint self-attention module is used to capture spatial features of fingers, the finger self-attention module is designed to aggregate features of the whole hand.
Experiments show that our method achieves competitive results on three gesture recognition datasets with much lower computational complexity.
arXiv Detail & Related papers (2021-06-25T02:15:53Z) - Multi Scale Temporal Graph Networks For Skeleton-based Action
Recognition [5.970574258839858]
Graph convolutional networks (GCNs) can effectively capture the features of related nodes and improve the performance of the model.
Existing methods based on GCNs have two problems. First, the consistency of temporal and spatial features is ignored for extracting features node by node and frame by frame.
We propose a novel model called Temporal Graph Networks (TGN) for action recognition.
arXiv Detail & Related papers (2020-12-05T08:08:25Z) - Spatio-Temporal Inception Graph Convolutional Networks for
Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition.
Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z) - Structure-Aware Human-Action Generation [126.05874420893092]
Graph convolutional networks (GCNs) are promising way to leverage structure information to learn structure representations.
We propose a variant of GCNs to leverage the powerful self-attention mechanism to adaptively sparsify a complete action graph in the temporal space.
Our method could dynamically attend to important past frames and construct a sparse graph to apply in the GCN framework, well-capturing the structure information in action sequences.
arXiv Detail & Related papers (2020-07-04T00:18:27Z) - Unifying Graph Embedding Features with Graph Convolutional Networks for
Skeleton-based Action Recognition [18.001693718043292]
We propose a novel framework, which unifies 15 graph embedding features into the graph convolutional network for human action recognition.
Our model is validated by three large-scale datasets, namely NTU-RGB+D, Kinetics and SYSU-3D.
arXiv Detail & Related papers (2020-03-06T02:31:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.