Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition
- URL: http://arxiv.org/abs/2202.04075v1
- Date: Tue, 8 Feb 2022 16:03:15 GMT
- Title: Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition
- Authors: Zhigang Tu, Jiaxu Zhang, Hongyan Li, Yujin Chen, and Junsong Yuan
- Abstract summary: We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
- Score: 65.78703941973183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, graph convolutional networks (GCNs) play an increasingly
critical role in skeleton-based human action recognition. However, most
GCN-based methods still have two main limitations: 1) They only consider the
motion information of the joints or process the joints and bones separately,
which are unable to fully explore the latent functional correlation between
joints and bones for action recognition. 2) Most of these works are performed
in the supervised learning way, which heavily relies on massive labeled
training data. To address these issues, we propose a semi-supervised
skeleton-based action recognition method which has been rarely exploited
before. We design a novel correlation-driven joint-bone fusion graph
convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head
as a decoder to achieve semi-supervised learning. Specifically, the CD-JBF-GC
can explore the motion transmission between the joint stream and the bone
stream, so that promoting both streams to learn more discriminative feature
representations. The pose prediction based auto-encoder in the self-supervised
training stage allows the network to learn motion representation from unlabeled
data, which is essential for action recognition. Extensive experiments on two
popular datasets, i.e. NTU-RGB+D and Kinetics-Skeleton, demonstrate that our
model achieves the state-of-the-art performance for semi-supervised
skeleton-based action recognition and is also useful for fully-supervised
methods.
Related papers
- Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Skeleton-based Action Recognition through Contrasting Two-Stream
Spatial-Temporal Networks [11.66009967197084]
We propose a novel Contrastive GCN-Transformer Network (ConGT) which fuses the spatial and temporal modules in a parallel way.
We conduct experiments on three benchmark datasets, which demonstrate that our model achieves state-of-the-art performance in action recognition.
arXiv Detail & Related papers (2023-01-27T02:12:08Z) - Pose-Guided Graph Convolutional Networks for Skeleton-Based Action
Recognition [32.07659338674024]
Graph convolutional networks (GCNs) can model the human body skeletons as spatial and temporal graphs.
In this work, we propose pose-guided GCN (PG-GCN), a multi-modal framework for high-performance human action recognition.
The core idea of this module is to utilize a trainable graph to aggregate features from the skeleton stream with that of the pose stream, which leads to a network with more robust feature representation ability.
arXiv Detail & Related papers (2022-10-10T02:08:49Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Skeletal Human Action Recognition using Hybrid Attention based Graph
Convolutional Network [3.261599248682793]
We propose a new adaptive spatial attention layer that extends local attention map to global based on relative distance and relative angle information.
We design a new initial graph adjacency matrix that connects head, hands and feet, which shows visible improvement in terms of action recognition accuracy.
The proposed model is evaluated on two large-scale and challenging datasets in the field of human activities in daily life.
arXiv Detail & Related papers (2022-07-12T12:22:21Z) - SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition.
We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors.
Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z) - Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based
Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification.
We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions.
We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z) - Non-local Graph Convolutional Network for joint Activity Recognition and
Motion Prediction [2.580765958706854]
3D skeleton-based motion prediction and activity recognition are two interwoven tasks in human behaviour analysis.
We propose a new way to combine the advantages of both graph convolutional neural networks and recurrent neural networks for joint human motion prediction and activity recognition.
arXiv Detail & Related papers (2021-08-03T14:07:10Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.