Unveiling the Hidden Realm: Self-supervised Skeleton-based Action
Recognition in Occluded Environments
- URL: http://arxiv.org/abs/2309.12029v1
- Date: Thu, 21 Sep 2023 12:51:11 GMT
- Title: Unveiling the Hidden Realm: Self-supervised Skeleton-based Action
Recognition in Occluded Environments
- Authors: Yifei Chen, Kunyu Peng, Alina Roitberg, David Schneider, Jiaming
Zhang, Junwei Zheng, Ruiping Liu, Yufan Chen, Kailun Yang, Rainer
Stiefelhagen
- Abstract summary: We propose a simple and effective method to empower robots with the capacity to address occlusion.
We first pre-train using occluded skeleton sequences, then use k-means clustering (KMeans) on sequence embeddings to group semantically similar samples.
We then employ K-nearest-neighbor (KNN) to fill in missing skeleton data based on the closest sample neighbors.
- Score: 41.664437160034176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To integrate action recognition methods into autonomous robotic systems, it
is crucial to consider adverse situations involving target occlusions. Such a
scenario, despite its practical relevance, is rarely addressed in existing
self-supervised skeleton-based action recognition methods. To empower robots
with the capacity to address occlusion, we propose a simple and effective
method. We first pre-train using occluded skeleton sequences, then use k-means
clustering (KMeans) on sequence embeddings to group semantically similar
samples. Next, we employ K-nearest-neighbor (KNN) to fill in missing skeleton
data based on the closest sample neighbors. Imputing incomplete skeleton
sequences to create relatively complete sequences as input provides significant
benefits to existing skeleton-based self-supervised models. Meanwhile, building
on the state-of-the-art Partial Spatio-Temporal Learning (PSTL), we introduce
an Occluded Partial Spatio-Temporal Learning (OPSTL) framework. This
enhancement utilizes Adaptive Spatial Masking (ASM) for better use of
high-quality, intact skeletons. The effectiveness of our imputation methods is
verified on the challenging occluded versions of the NTURGB+D 60 and NTURGB+D
120. The source code will be made publicly available at
https://github.com/cyfml/OPSTL.
Related papers
- Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
Pre-training [110.55093254677638]
We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL)
In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE.
Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
arXiv Detail & Related papers (2023-07-17T13:33:11Z) - Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep
Skeleton Features [3.255030588361124]
Unsupervised anomaly action recognition identifies video-level abnormal-human-behavior events in an unsupervised manner without abnormal samples.
We present a unified, user prompt-guided zero-shot learning framework using a target domain-independent skeleton feature extractor.
We incorporate a similarity score between the user prompt embeddings and skeleton features aligned in the common space into the anomaly score, which indirectly supplements normal actions.
arXiv Detail & Related papers (2023-03-27T12:59:33Z) - Self-supervised Action Representation Learning from Partial
Spatio-Temporal Skeleton Sequences [29.376328807860993]
We propose a Partial Spatio-Temporal Learning (PSTL) framework to exploit the local relationship between different skeleton joints and video frames.
Our method achieves state-of-the-art performance on NTURGB+D 60, NTURGBMM+D 120 and PKU-D under various downstream tasks.
arXiv Detail & Related papers (2023-02-17T17:35:05Z) - SkeletonMAE: Spatial-Temporal Masked Autoencoders for Self-supervised
Skeleton Action Recognition [13.283178393519234]
Self-supervised skeleton-based action recognition has attracted more attention.
With utilizing the unlabeled data, more generalizable features can be learned to alleviate the overfitting problem.
We propose a spatial-temporal masked autoencoder framework for self-supervised 3D skeleton-based action recognition.
arXiv Detail & Related papers (2022-09-01T20:54:27Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - SimMC: Simple Masked Contrastive Learning of Skeleton Representations
for Unsupervised Person Re-Identification [63.903237777588316]
We present a generic Simple Masked Contrastive learning (SimMC) framework to learn effective representations from unlabeled 3D skeletons for person re-ID.
Specifically, to fully exploit skeleton features within each skeleton sequence, we first devise a masked prototype contrastive learning (MPC) scheme.
Then, we propose the masked intra-sequence contrastive learning (MIC) to capture intra-sequence pattern consistency between subsequences.
arXiv Detail & Related papers (2022-04-21T00:19:38Z) - A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification [65.18004601366066]
Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages.
This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID.
arXiv Detail & Related papers (2020-09-05T16:06:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.