Related papers: Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation

Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation

URL: http://arxiv.org/abs/2512.11458v1
Date: Fri, 12 Dec 2025 10:53:51 GMT
Title: Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation
Authors: Jingmin Zhu, Anqi Zhu, Hossein Rahmani, Jun Liu, Mohammed Bennamoun, Qiuhong Ke,
Abstract summary: We introduce Skeleton-Cache, the first training-free adaptation framework for skeleton-based zero-shot action recognition (SZAR)<n>Skeleton-Cache reformulates inference as a lightweight retrieval process over a non-parametric cache.<n>Experiments on NTU RGB+D 60/120 and PKU-MMD II demonstrate that Skeleton-Cache consistently boosts the performance of various SZAR backbones.
Score: 52.02799244361572
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Skeleton-Cache, the first training-free test-time adaptation framework for skeleton-based zero-shot action recognition (SZAR), aimed at improving model generalization to unseen actions during inference. Skeleton-Cache reformulates inference as a lightweight retrieval process over a non-parametric cache that stores structured skeleton representations, combining both global and fine-grained local descriptors. To guide the fusion of descriptor-wise predictions, we leverage the semantic reasoning capabilities of large language models (LLMs) to assign class-specific importance weights. By integrating these structured descriptors with LLM-guided semantic priors, Skeleton-Cache dynamically adapts to unseen actions without any additional training or access to training data. Extensive experiments on NTU RGB+D 60/120 and PKU-MMD II demonstrate that Skeleton-Cache consistently boosts the performance of various SZAR backbones under both zero-shot and generalized zero-shot settings. The code is publicly available at https://github.com/Alchemist0754/Skeleton-Cache.

Related papers

DynaPURLS: Dynamic Refinement of Part-aware Representations for Skeleton-based Zero-Shot Action Recognition [51.80782323686666]
We introduce textbfDynaPURLS, a unified framework that establishes robust, multi-scale visual-semantic correspondences.<n>Our framework leverages a large language model to generate hierarchical textual descriptions that encompass both global movements and local body-part dynamics.<n>Experiments on three large-scale benchmark datasets, including NTU RGB+D 60/120 and PKU-MMD, demonstrate that DynaPURLS significantly outperforms prior art.
arXiv Detail & Related papers (2025-12-12T10:39:10Z)
MS-CLR: Multi-Skeleton Contrastive Learning for Human Action Recognition [49.91188543847175]
Multi-Skeleton Contrastive Learning (MS-CLR) is a framework that aligns pose representations across multiple skeleton conventions extracted from the same sequence.<n>MS-CLR consistently improves performance over strong single-skeleton contrastive learning baselines.<n>A multi-skeleton ensemble further boosts performance, setting new state-of-the-art results on both datasets.
arXiv Detail & Related papers (2025-08-20T17:58:03Z)
Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition [57.97930719585095]
We introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. Our approach is evaluated on various skeleton/language backbones and three large-scale datasets. The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains.
arXiv Detail & Related papers (2024-06-19T08:22:32Z)
Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance. Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework. Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z)
Exploring Self-supervised Skeleton-based Action Recognition in Occluded Environments [40.322770236718775]
We propose IosPSTL, a simple and effective self-supervised learning framework designed to handle occlusions.<n>IosPSTL combines a cluster-agnostic KNN imputer with an Occluded Partial Spatio-Temporal Learning (OPSTL) strategy.<n>OPSTL module incorporates Adaptive Spatial Masking (ASM) to make better use of intact, high-quality skeleton sequences during training.
arXiv Detail & Related papers (2023-09-21T12:51:11Z)
Multi-Semantic Fusion Model for Generalized Zero-Shot Skeleton-Based Action Recognition [32.291333054680855]
Generalized zero-shot skeleton-based action recognition (GZSSAR) is a new challenging problem in computer vision community. We propose a multi-semantic fusion (MSF) model for improving the performance of GZSSAR.
arXiv Detail & Related papers (2023-09-18T09:00:25Z)
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training [110.55093254677638]
We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL) In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE. Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
arXiv Detail & Related papers (2023-07-17T13:33:11Z)
Skeleton-based Action Recognition via Adaptive Cross-Form Learning [75.92422282666767]
Skeleton-based action recognition aims to project skeleton sequences to action categories, where sequences are derived from multiple forms of pre-detected points. Existing methods tend to improve GCNs by leveraging multi-form skeletons due to their complementary cues. We present Adaptive Cross-Form Learning (ACFL), which empowers well-designed GCNs to generate complementary representation from single-form skeletons.
arXiv Detail & Related papers (2022-06-30T07:40:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.