Related papers: SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training

SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training

URL: http://arxiv.org/abs/2307.08476v1
Date: Mon, 17 Jul 2023 13:33:11 GMT
Title: SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training
Authors: Hong Yan, Yang Liu, Yushen Wei, Zhen Li, Guanbin Li, Liang Lin
Abstract summary: We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL) In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE. Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
Score: 110.55093254677638
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Skeleton sequence representation learning has shown great advantages for action recognition due to its promising ability to model human joints and topology. However, the current methods usually require sufficient labeled data for training computationally expensive models, which is labor-intensive and time-consuming. Moreover, these methods ignore how to utilize the fine-grained dependencies among different skeleton joints to pre-train an efficient skeleton sequence learning model that can generalize well across different datasets. In this paper, we propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL). To comprehensively capture the human pose and obtain discriminative skeleton sequence representation, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE, which embeds skeleton joint sequence into Graph Convolutional Network (GCN) and reconstructs the masked skeleton joints and edges based on the prior human topology knowledge. Then, the pre-trained SkeletonMAE encoder is integrated with the Spatial-Temporal Representation Learning (STRL) module to build the SSL framework. Extensive experimental results show that our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods on FineGym, Diving48, NTU 60 and NTU 120 datasets. Additionally, we obtain comparable performance to some fully supervised methods. The code is avaliable at https://github.com/HongYan1123/SkeletonMAE.

Related papers

Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification [60.939250172443586]
Person re-identification (re-ID) via 3D skeleton data is a challenging task with significant value in many scenarios. Existing skeleton-based methods typically assume virtual motion relations between all joints, and adopt average joint or sequence representations for learning. This paper presents a generic Motif guided graph transformer with Combinatorial skeleton prototype learning (MoCos) MoCos exploits structure-specific and gait-related body relations as well as features of skeleton graphs to learn effective skeleton representations for person re-ID.
arXiv Detail & Related papers (2024-12-12T08:13:29Z)
ReL-SAR: Representation Learning for Skeleton Action Recognition with Convolutional Transformers and BYOL [6.603505460200282]
Unsupervised representation learning is of prime importance to leverage unlabeled skeleton data. We design a lightweight convolutional transformer framework, named ReL-SAR, for jointly modeling spatial and temporal cues in skeleton sequences. We capitalize on Bootstrap Your Own Latent (BYOL) to learn robust representations from unlabeled skeleton sequence data.
arXiv Detail & Related papers (2024-09-09T16:03:26Z)
Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance. Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework. Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z)
Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-Identification [70.90142717649785]
This paper proposes a generic unsupervised Hierarchical skeleton Meta-Prototype Contrastive learning (Hi-MPC) approach with Hard Skeleton Mining (HSM) for person re-ID with unlabeled 3D skeletons. By converting original prototypes into meta-prototypes with multiple homogeneous transformations, we induce the model to learn the inherent consistency of prototypes to capture more effective skeleton features for person re-ID.
arXiv Detail & Related papers (2023-07-24T16:18:22Z)
One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching [77.6989219290789]
One-shot skeleton action recognition aims to learn a skeleton action recognition model with a single training sample. This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching.
arXiv Detail & Related papers (2023-07-14T11:52:10Z)
SkeletonMAE: Spatial-Temporal Masked Autoencoders for Self-supervised Skeleton Action Recognition [13.283178393519234]
Self-supervised skeleton-based action recognition has attracted more attention. With utilizing the unlabeled data, more generalizable features can be learned to alleviate the overfitting problem. We propose a spatial-temporal masked autoencoder framework for self-supervised 3D skeleton-based action recognition.
arXiv Detail & Related papers (2022-09-01T20:54:27Z)
SimMC: Simple Masked Contrastive Learning of Skeleton Representations for Unsupervised Person Re-Identification [63.903237777588316]
We present a generic Simple Masked Contrastive learning (SimMC) framework to learn effective representations from unlabeled 3D skeletons for person re-ID. Specifically, to fully exploit skeleton features within each skeleton sequence, we first devise a masked prototype contrastive learning (MPC) scheme. Then, we propose the masked intra-sequence contrastive learning (MIC) to capture intra-sequence pattern consistency between subsequences.
arXiv Detail & Related papers (2022-04-21T00:19:38Z)
Multi-Scale Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition. MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z)
Skeleton-Contrastive 3D Action Representation Learning [35.06361753065124]
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition. Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets.
arXiv Detail & Related papers (2021-08-08T14:44:59Z)
Predictively Encoded Graph Convolutional Network for Noise-Robust Skeleton-based Action Recognition [6.729108277517129]
We propose a skeleton-based action recognition method which is robust to noise information of given skeleton features. Our approach achieves outstanding performance when skeleton samples are noised compared with existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-17T03:37:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.