Exploring Self-Supervised Skeleton-Based Human Action Recognition under Occlusions
- URL: http://arxiv.org/abs/2309.12029v2
- Date: Wed, 23 Oct 2024 04:36:26 GMT
- Title: Exploring Self-Supervised Skeleton-Based Human Action Recognition under Occlusions
- Authors: Yifei Chen, Kunyu Peng, Alina Roitberg, David Schneider, Jiaming Zhang, Junwei Zheng, Ruiping Liu, Yufan Chen, Kailun Yang, Rainer Stiefelhagen,
- Abstract summary: We propose a method to integrate self-supervised skeleton-based action recognition methods into autonomous robotic systems.
We first pre-train using occluded skeleton sequences, then use k-means clustering (KMeans) on sequence embeddings to group semantically similar samples.
Imputing incomplete skeleton sequences to create relatively complete sequences provides significant benefits to existing skeleton-based self-supervised methods.
- Score: 40.322770236718775
- License:
- Abstract: To integrate self-supervised skeleton-based action recognition methods into autonomous robotic systems, it is crucial to consider adverse situations involving target occlusions. Such a scenario, despite its practical relevance, is rarely addressed in existing self-supervised skeleton-based action recognition methods. To empower models with the capacity to address occlusion, we propose a simple and effective method. We first pre-train using occluded skeleton sequences, then use k-means clustering (KMeans) on sequence embeddings to group semantically similar samples. Next, we propose KNN-Imputation to fill in missing skeleton data based on the closest sample neighbors. Imputing incomplete skeleton sequences to create relatively complete sequences as input provides significant benefits to existing skeleton-based self-supervised methods. Meanwhile, building on the state-of-the-art Partial Spatio-Temporal Learning (PSTL), we introduce an Occluded Partial Spatio-Temporal Learning (OPSTL) framework. This enhancement utilizes Adaptive Spatial Masking (ASM) for better use of high-quality, intact skeletons. The new proposed method is verified on the challenging occluded versions of the NTURGB+D 60 and NTURGB+D 120. The source code is publicly available at https://github.com/cyfml/OPSTL.
Related papers
- Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
Pre-training [110.55093254677638]
We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL)
In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE.
Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
arXiv Detail & Related papers (2023-07-17T13:33:11Z) - Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep
Skeleton Features [3.255030588361124]
Unsupervised anomaly action recognition identifies video-level abnormal-human-behavior events in an unsupervised manner without abnormal samples.
We present a unified, user prompt-guided zero-shot learning framework using a target domain-independent skeleton feature extractor.
We incorporate a similarity score between the user prompt embeddings and skeleton features aligned in the common space into the anomaly score, which indirectly supplements normal actions.
arXiv Detail & Related papers (2023-03-27T12:59:33Z) - Self-supervised Action Representation Learning from Partial
Spatio-Temporal Skeleton Sequences [29.376328807860993]
We propose a Partial Spatio-Temporal Learning (PSTL) framework to exploit the local relationship between different skeleton joints and video frames.
Our method achieves state-of-the-art performance on NTURGB+D 60, NTURGBMM+D 120 and PKU-D under various downstream tasks.
arXiv Detail & Related papers (2023-02-17T17:35:05Z) - SkeletonMAE: Spatial-Temporal Masked Autoencoders for Self-supervised
Skeleton Action Recognition [13.283178393519234]
Self-supervised skeleton-based action recognition has attracted more attention.
With utilizing the unlabeled data, more generalizable features can be learned to alleviate the overfitting problem.
We propose a spatial-temporal masked autoencoder framework for self-supervised 3D skeleton-based action recognition.
arXiv Detail & Related papers (2022-09-01T20:54:27Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - SimMC: Simple Masked Contrastive Learning of Skeleton Representations
for Unsupervised Person Re-Identification [63.903237777588316]
We present a generic Simple Masked Contrastive learning (SimMC) framework to learn effective representations from unlabeled 3D skeletons for person re-ID.
Specifically, to fully exploit skeleton features within each skeleton sequence, we first devise a masked prototype contrastive learning (MPC) scheme.
Then, we propose the masked intra-sequence contrastive learning (MIC) to capture intra-sequence pattern consistency between subsequences.
arXiv Detail & Related papers (2022-04-21T00:19:38Z) - A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification [65.18004601366066]
Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages.
This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID.
arXiv Detail & Related papers (2020-09-05T16:06:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.