SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
Pre-training
- URL: http://arxiv.org/abs/2307.08476v1
- Date: Mon, 17 Jul 2023 13:33:11 GMT
- Title: SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
Pre-training
- Authors: Hong Yan, Yang Liu, Yushen Wei, Zhen Li, Guanbin Li, Liang Lin
- Abstract summary: We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL)
In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE.
Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
- Score: 110.55093254677638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Skeleton sequence representation learning has shown great advantages for
action recognition due to its promising ability to model human joints and
topology. However, the current methods usually require sufficient labeled data
for training computationally expensive models, which is labor-intensive and
time-consuming. Moreover, these methods ignore how to utilize the fine-grained
dependencies among different skeleton joints to pre-train an efficient skeleton
sequence learning model that can generalize well across different datasets. In
this paper, we propose an efficient skeleton sequence learning framework, named
Skeleton Sequence Learning (SSL). To comprehensively capture the human pose and
obtain discriminative skeleton sequence representation, we build an asymmetric
graph-based encoder-decoder pre-training architecture named SkeletonMAE, which
embeds skeleton joint sequence into Graph Convolutional Network (GCN) and
reconstructs the masked skeleton joints and edges based on the prior human
topology knowledge. Then, the pre-trained SkeletonMAE encoder is integrated
with the Spatial-Temporal Representation Learning (STRL) module to build the
SSL framework. Extensive experimental results show that our SSL generalizes
well across different datasets and outperforms the state-of-the-art
self-supervised skeleton-based action recognition methods on FineGym, Diving48,
NTU 60 and NTU 120 datasets. Additionally, we obtain comparable performance to
some fully supervised methods. The code is avaliable at
https://github.com/HongYan1123/SkeletonMAE.
Related papers
- ReL-SAR: Representation Learning for Skeleton Action Recognition with Convolutional Transformers and BYOL [6.603505460200282]
Unsupervised representation learning is of prime importance to leverage unlabeled skeleton data.
We design a lightweight convolutional transformer framework, named ReL-SAR, for jointly modeling spatial and temporal cues in skeleton sequences.
We capitalize on Bootstrap Your Own Latent (BYOL) to learn robust representations from unlabeled skeleton sequence data.
arXiv Detail & Related papers (2024-09-09T16:03:26Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard
Skeleton Mining for Unsupervised Person Re-Identification [70.90142717649785]
This paper proposes a generic unsupervised Hierarchical skeleton Meta-Prototype Contrastive learning (Hi-MPC) approach with Hard Skeleton Mining (HSM) for person re-ID with unlabeled 3D skeletons.
By converting original prototypes into meta-prototypes with multiple homogeneous transformations, we induce the model to learn the inherent consistency of prototypes to capture more effective skeleton features for person re-ID.
arXiv Detail & Related papers (2023-07-24T16:18:22Z) - One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton
Matching [77.6989219290789]
One-shot skeleton action recognition aims to learn a skeleton action recognition model with a single training sample.
This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching.
arXiv Detail & Related papers (2023-07-14T11:52:10Z) - SkeletonMAE: Spatial-Temporal Masked Autoencoders for Self-supervised
Skeleton Action Recognition [13.283178393519234]
Self-supervised skeleton-based action recognition has attracted more attention.
With utilizing the unlabeled data, more generalizable features can be learned to alleviate the overfitting problem.
We propose a spatial-temporal masked autoencoder framework for self-supervised 3D skeleton-based action recognition.
arXiv Detail & Related papers (2022-09-01T20:54:27Z) - SimMC: Simple Masked Contrastive Learning of Skeleton Representations
for Unsupervised Person Re-Identification [63.903237777588316]
We present a generic Simple Masked Contrastive learning (SimMC) framework to learn effective representations from unlabeled 3D skeletons for person re-ID.
Specifically, to fully exploit skeleton features within each skeleton sequence, we first devise a masked prototype contrastive learning (MPC) scheme.
Then, we propose the masked intra-sequence contrastive learning (MIC) to capture intra-sequence pattern consistency between subsequences.
arXiv Detail & Related papers (2022-04-21T00:19:38Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - Skeleton-Contrastive 3D Action Representation Learning [35.06361753065124]
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition.
Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets.
arXiv Detail & Related papers (2021-08-08T14:44:59Z) - Predictively Encoded Graph Convolutional Network for Noise-Robust
Skeleton-based Action Recognition [6.729108277517129]
We propose a skeleton-based action recognition method which is robust to noise information of given skeleton features.
Our approach achieves outstanding performance when skeleton samples are noised compared with existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-17T03:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.