Navigating Open Set Scenarios for Skeleton-based Action Recognition
- URL: http://arxiv.org/abs/2312.06330v1
- Date: Mon, 11 Dec 2023 12:29:32 GMT
- Title: Navigating Open Set Scenarios for Skeleton-based Action Recognition
- Authors: Kunyu Peng, Cheng Yin, Junwei Zheng, Ruiping Liu, David Schneider,
Jiaming Zhang, Kailun Yang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina
Roitberg
- Abstract summary: We tackle the unexplored Open-Set Skeleton-based Action Recognition (OS-SAR) task.
We propose a distance-based cross-modality method that leverages the cross-modal alignment of skeleton joints, bones, and velocities.
- Score: 45.488649741347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In real-world scenarios, human actions often fall outside the distribution of
training data, making it crucial for models to recognize known actions and
reject unknown ones. However, using pure skeleton data in such open-set
conditions poses challenges due to the lack of visual background cues and the
distinct sparse structure of body pose sequences. In this paper, we tackle the
unexplored Open-Set Skeleton-based Action Recognition (OS-SAR) task and
formalize the benchmark on three skeleton-based datasets. We assess the
performance of seven established open-set approaches on our task and identify
their limits and critical generalization issues when dealing with skeleton
information. To address these challenges, we propose a distance-based
cross-modality ensemble method that leverages the cross-modal alignment of
skeleton joints, bones, and velocities to achieve superior open-set recognition
performance. We refer to the key idea as CrossMax - an approach that utilizes a
novel cross-modality mean max discrepancy suppression mechanism to align latent
spaces during training and a cross-modality distance-based logits refinement
method during testing. CrossMax outperforms existing approaches and
consistently yields state-of-the-art results across all datasets and backbones.
The benchmark, code, and models will be released at
https://github.com/KPeng9510/OS-SAR.
Related papers
- Spatial Hierarchy and Temporal Attention Guided Cross Masking for Self-supervised Skeleton-based Action Recognition [4.036669828958854]
We introduce a hierarchy and attention guided cross-masking framework (HA-CM) that applies masking to skeleton sequences from both spatial and temporal perspectives.
In spatial graphs, we utilize hyperbolic space to maintain joint distinctions and effectively preserve the hierarchical structure of high-dimensional skeletons.
In temporal flows, we substitute traditional distance metrics with the global attention of joints for masking, addressing the convergence of distances in high-dimensional space and the lack of a global perspective.
arXiv Detail & Related papers (2024-09-26T15:28:25Z) - Exploring Self-Supervised Skeleton-Based Human Action Recognition under Occlusions [40.322770236718775]
We propose a method to integrate self-supervised skeleton-based action recognition methods into autonomous robotic systems.
We first pre-train using occluded skeleton sequences, then use k-means clustering (KMeans) on sequence embeddings to group semantically similar samples.
Imputing incomplete skeleton sequences to create relatively complete sequences provides significant benefits to existing skeleton-based self-supervised methods.
arXiv Detail & Related papers (2023-09-21T12:51:11Z) - Temporal Action Localization with Enhanced Instant Discriminability [66.76095239972094]
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
We propose a one-stage framework named TriDet to resolve imprecise predictions of action boundaries by existing methods.
Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets.
arXiv Detail & Related papers (2023-09-11T16:17:50Z) - SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
Pre-training [110.55093254677638]
We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL)
In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE.
Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
arXiv Detail & Related papers (2023-07-17T13:33:11Z) - One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton
Matching [77.6989219290789]
One-shot skeleton action recognition aims to learn a skeleton action recognition model with a single training sample.
This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching.
arXiv Detail & Related papers (2023-07-14T11:52:10Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - UNIK: A Unified Framework for Real-world Skeleton-based Action
Recognition [11.81043814295441]
We introduce UNIK, a novel skeleton-based action recognition method that is able to generalize across datasets.
To study the cross-domain generalizability of action recognition in real-world videos, we re-evaluate state-of-the-art approaches as well as the proposed UNIK.
Results show that the proposed UNIK, with pre-training on Posetics, generalizes well and outperforms state-of-the-art when transferred onto four target action classification datasets.
arXiv Detail & Related papers (2021-07-19T02:00:28Z) - Predictively Encoded Graph Convolutional Network for Noise-Robust
Skeleton-based Action Recognition [6.729108277517129]
We propose a skeleton-based action recognition method which is robust to noise information of given skeleton features.
Our approach achieves outstanding performance when skeleton samples are noised compared with existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-17T03:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.