Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation
- URL: http://arxiv.org/abs/2512.11458v1
- Date: Fri, 12 Dec 2025 10:53:51 GMT
- Title: Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation
- Authors: Jingmin Zhu, Anqi Zhu, Hossein Rahmani, Jun Liu, Mohammed Bennamoun, Qiuhong Ke,
- Abstract summary: We introduce Skeleton-Cache, the first training-free adaptation framework for skeleton-based zero-shot action recognition (SZAR)<n>Skeleton-Cache reformulates inference as a lightweight retrieval process over a non-parametric cache.<n>Experiments on NTU RGB+D 60/120 and PKU-MMD II demonstrate that Skeleton-Cache consistently boosts the performance of various SZAR backbones.
- Score: 52.02799244361572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Skeleton-Cache, the first training-free test-time adaptation framework for skeleton-based zero-shot action recognition (SZAR), aimed at improving model generalization to unseen actions during inference. Skeleton-Cache reformulates inference as a lightweight retrieval process over a non-parametric cache that stores structured skeleton representations, combining both global and fine-grained local descriptors. To guide the fusion of descriptor-wise predictions, we leverage the semantic reasoning capabilities of large language models (LLMs) to assign class-specific importance weights. By integrating these structured descriptors with LLM-guided semantic priors, Skeleton-Cache dynamically adapts to unseen actions without any additional training or access to training data. Extensive experiments on NTU RGB+D 60/120 and PKU-MMD II demonstrate that Skeleton-Cache consistently boosts the performance of various SZAR backbones under both zero-shot and generalized zero-shot settings. The code is publicly available at https://github.com/Alchemist0754/Skeleton-Cache.
Related papers
- DynaPURLS: Dynamic Refinement of Part-aware Representations for Skeleton-based Zero-Shot Action Recognition [51.80782323686666]
We introduce textbfDynaPURLS, a unified framework that establishes robust, multi-scale visual-semantic correspondences.<n>Our framework leverages a large language model to generate hierarchical textual descriptions that encompass both global movements and local body-part dynamics.<n>Experiments on three large-scale benchmark datasets, including NTU RGB+D 60/120 and PKU-MMD, demonstrate that DynaPURLS significantly outperforms prior art.
arXiv Detail & Related papers (2025-12-12T10:39:10Z) - MS-CLR: Multi-Skeleton Contrastive Learning for Human Action Recognition [49.91188543847175]
Multi-Skeleton Contrastive Learning (MS-CLR) is a framework that aligns pose representations across multiple skeleton conventions extracted from the same sequence.<n>MS-CLR consistently improves performance over strong single-skeleton contrastive learning baselines.<n>A multi-skeleton ensemble further boosts performance, setting new state-of-the-art results on both datasets.
arXiv Detail & Related papers (2025-08-20T17:58:03Z) - Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition [57.97930719585095]
We introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales.
Our approach is evaluated on various skeleton/language backbones and three large-scale datasets.
The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains.
arXiv Detail & Related papers (2024-06-19T08:22:32Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Exploring Self-supervised Skeleton-based Action Recognition in Occluded Environments [40.322770236718775]
We propose IosPSTL, a simple and effective self-supervised learning framework designed to handle occlusions.<n>IosPSTL combines a cluster-agnostic KNN imputer with an Occluded Partial Spatio-Temporal Learning (OPSTL) strategy.<n>OPSTL module incorporates Adaptive Spatial Masking (ASM) to make better use of intact, high-quality skeleton sequences during training.
arXiv Detail & Related papers (2023-09-21T12:51:11Z) - Multi-Semantic Fusion Model for Generalized Zero-Shot Skeleton-Based
Action Recognition [32.291333054680855]
Generalized zero-shot skeleton-based action recognition (GZSSAR) is a new challenging problem in computer vision community.
We propose a multi-semantic fusion (MSF) model for improving the performance of GZSSAR.
arXiv Detail & Related papers (2023-09-18T09:00:25Z) - SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
Pre-training [110.55093254677638]
We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL)
In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE.
Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
arXiv Detail & Related papers (2023-07-17T13:33:11Z) - Skeleton-based Action Recognition via Adaptive Cross-Form Learning [75.92422282666767]
Skeleton-based action recognition aims to project skeleton sequences to action categories, where sequences are derived from multiple forms of pre-detected points.
Existing methods tend to improve GCNs by leveraging multi-form skeletons due to their complementary cues.
We present Adaptive Cross-Form Learning (ACFL), which empowers well-designed GCNs to generate complementary representation from single-form skeletons.
arXiv Detail & Related papers (2022-06-30T07:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.