Self-attention based anchor proposal for skeleton-based action
recognition
- URL: http://arxiv.org/abs/2112.09413v1
- Date: Fri, 17 Dec 2021 10:05:57 GMT
- Title: Self-attention based anchor proposal for skeleton-based action
recognition
- Authors: Ruijie Hou, Zhao Wang
- Abstract summary: Skeleton sequences are widely used for action recognition task due to its lightweight and compact characteristics.
Recent graph convolutional network (GCN) approaches have achieved great success for skeleton-based action recognition.
We propose a novel self-attention based skeleton-anchor proposal (SAP) module to comprehensively model the internal relations of a human body.
- Score: 3.611872164105537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Skeleton sequences are widely used for action recognition task due to its
lightweight and compact characteristics. Recent graph convolutional network
(GCN) approaches have achieved great success for skeleton-based action
recognition since its grateful modeling ability of non-Euclidean data. GCN is
able to utilize the short-range joint dependencies while lack to directly model
the distant joints relations that are vital to distinguishing various actions.
Thus, many GCN approaches try to employ hierarchical mechanism to aggregate
wider-range neighborhood information. We propose a novel self-attention based
skeleton-anchor proposal (SAP) module to comprehensively model the internal
relations of a human body for motion feature learning. The proposed SAP module
aims to explore inherent relationship within human body using a triplet
representation via encoding high order angle information rather than the fixed
pair-wise bone connection used in the existing hierarchical GCN approaches. A
Self-attention based anchor selection method is designed in the proposed SAP
module for extracting the root point of encoding angular information. By
coupling proposed SAP module with popular spatial-temporal graph neural
networks, e.g. MSG3D, it achieves new state-of-the-art accuracy on challenging
benchmark datasets. Further ablation study have shown the effectiveness of our
proposed SAP module, which is able to obviously improve the performance of many
popular skeleton-based action recognition methods.
Related papers
- Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition [0.0]
In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN.
We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node.
arXiv Detail & Related papers (2024-04-03T10:25:45Z) - Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action
Recognition through Redefined Skeletal Topology Awareness [24.83836008577395]
Graph Convolutional Networks (GCNs) have long defined the state-of-the-art in skeleton-based action recognition.
They tend to optimize the adjacency matrix jointly with the model weights.
This process causes a gradual decay of bone connectivity data, culminating in a model indifferent to the very topology it sought to map.
We propose an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.
arXiv Detail & Related papers (2023-05-19T06:40:12Z) - Pose-Guided Graph Convolutional Networks for Skeleton-Based Action
Recognition [32.07659338674024]
Graph convolutional networks (GCNs) can model the human body skeletons as spatial and temporal graphs.
In this work, we propose pose-guided GCN (PG-GCN), a multi-modal framework for high-performance human action recognition.
The core idea of this module is to utilize a trainable graph to aggregate features from the skeleton stream with that of the pose stream, which leads to a network with more robust feature representation ability.
arXiv Detail & Related papers (2022-10-10T02:08:49Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition.
We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors.
Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based
Gesture Recognition [73.64451471862613]
We propose an efficient hierarchical self-attention network (HAN) for skeleton-based gesture recognition.
Joint self-attention module is used to capture spatial features of fingers, the finger self-attention module is designed to aggregate features of the whole hand.
Experiments show that our method achieves competitive results on three gesture recognition datasets with much lower computational complexity.
arXiv Detail & Related papers (2021-06-25T02:15:53Z) - Multi-Level Graph Encoding with Structural-Collaborative Relation
Learning for Skeleton-Based Person Re-Identification [11.303008512400893]
Skeleton-based person re-identification (Re-ID) is an emerging open topic providing great value for safety-critical applications.
Existing methods typically extract hand-crafted features or model skeleton dynamics from the trajectory of body joints.
We propose a Multi-level Graph encoding approach with Structural-Collaborative Relation learning (MG-SCR) to encode discriminative graph features for person Re-ID.
arXiv Detail & Related papers (2021-06-06T09:09:57Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.