SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
Pre-training
- URL: http://arxiv.org/abs/2307.08476v1
- Date: Mon, 17 Jul 2023 13:33:11 GMT
- Title: SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
Pre-training
- Authors: Hong Yan, Yang Liu, Yushen Wei, Zhen Li, Guanbin Li, Liang Lin
- Abstract summary: We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL)
In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE.
Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
- Score: 110.55093254677638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Skeleton sequence representation learning has shown great advantages for
action recognition due to its promising ability to model human joints and
topology. However, the current methods usually require sufficient labeled data
for training computationally expensive models, which is labor-intensive and
time-consuming. Moreover, these methods ignore how to utilize the fine-grained
dependencies among different skeleton joints to pre-train an efficient skeleton
sequence learning model that can generalize well across different datasets. In
this paper, we propose an efficient skeleton sequence learning framework, named
Skeleton Sequence Learning (SSL). To comprehensively capture the human pose and
obtain discriminative skeleton sequence representation, we build an asymmetric
graph-based encoder-decoder pre-training architecture named SkeletonMAE, which
embeds skeleton joint sequence into Graph Convolutional Network (GCN) and
reconstructs the masked skeleton joints and edges based on the prior human
topology knowledge. Then, the pre-trained SkeletonMAE encoder is integrated
with the Spatial-Temporal Representation Learning (STRL) module to build the
SSL framework. Extensive experimental results show that our SSL generalizes
well across different datasets and outperforms the state-of-the-art
self-supervised skeleton-based action recognition methods on FineGym, Diving48,
NTU 60 and NTU 120 datasets. Additionally, we obtain comparable performance to
some fully supervised methods. The code is avaliable at
https://github.com/HongYan1123/SkeletonMAE.
Related papers
- Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification [60.939250172443586]
Person re-identification (re-ID) via 3D skeleton data is a challenging task with significant value in many scenarios.
Existing skeleton-based methods typically assume virtual motion relations between all joints, and adopt average joint or sequence representations for learning.
This paper presents a generic Motif guided graph transformer with Combinatorial skeleton prototype learning (MoCos)
MoCos exploits structure-specific and gait-related body relations as well as features of skeleton graphs to learn effective skeleton representations for person re-ID.
arXiv Detail & Related papers (2024-12-12T08:13:29Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard
Skeleton Mining for Unsupervised Person Re-Identification [70.90142717649785]
This paper proposes a generic unsupervised Hierarchical skeleton Meta-Prototype Contrastive learning (Hi-MPC) approach with Hard Skeleton Mining (HSM) for person re-ID with unlabeled 3D skeletons.
By converting original prototypes into meta-prototypes with multiple homogeneous transformations, we induce the model to learn the inherent consistency of prototypes to capture more effective skeleton features for person re-ID.
arXiv Detail & Related papers (2023-07-24T16:18:22Z) - SkeletonMAE: Spatial-Temporal Masked Autoencoders for Self-supervised
Skeleton Action Recognition [13.283178393519234]
Self-supervised skeleton-based action recognition has attracted more attention.
With utilizing the unlabeled data, more generalizable features can be learned to alleviate the overfitting problem.
We propose a spatial-temporal masked autoencoder framework for self-supervised 3D skeleton-based action recognition.
arXiv Detail & Related papers (2022-09-01T20:54:27Z) - SimMC: Simple Masked Contrastive Learning of Skeleton Representations
for Unsupervised Person Re-Identification [63.903237777588316]
We present a generic Simple Masked Contrastive learning (SimMC) framework to learn effective representations from unlabeled 3D skeletons for person re-ID.
Specifically, to fully exploit skeleton features within each skeleton sequence, we first devise a masked prototype contrastive learning (MPC) scheme.
Then, we propose the masked intra-sequence contrastive learning (MIC) to capture intra-sequence pattern consistency between subsequences.
arXiv Detail & Related papers (2022-04-21T00:19:38Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - Skeleton-Contrastive 3D Action Representation Learning [35.06361753065124]
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition.
Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets.
arXiv Detail & Related papers (2021-08-08T14:44:59Z) - Predictively Encoded Graph Convolutional Network for Noise-Robust
Skeleton-based Action Recognition [6.729108277517129]
We propose a skeleton-based action recognition method which is robust to noise information of given skeleton features.
Our approach achieves outstanding performance when skeleton samples are noised compared with existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-17T03:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.