Fine-Grained Side Information Guided Dual-Prompts for Zero-Shot Skeleton Action Recognition
- URL: http://arxiv.org/abs/2404.07487v2
- Date: Mon, 15 Apr 2024 02:25:22 GMT
- Title: Fine-Grained Side Information Guided Dual-Prompts for Zero-Shot Skeleton Action Recognition
- Authors: Yang Chen, Jingcai Guo, Tian He, Ling Wang,
- Abstract summary: We propose a novel method via Side information and dual-prompts learning for skeleton-based zero-shot action recognition (STAR) at the fine-grained level.
Our method achieves state-of-the-art performance in ZSL and GZSL settings on datasets.
- Score: 18.012159340628557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Skeleton-based zero-shot action recognition aims to recognize unknown human actions based on the learned priors of the known skeleton-based actions and a semantic descriptor space shared by both known and unknown categories. However, previous works focus on establishing the bridges between the known skeleton representation space and semantic descriptions space at the coarse-grained level for recognizing unknown action categories, ignoring the fine-grained alignment of these two spaces, resulting in suboptimal performance in distinguishing high-similarity action categories. To address these challenges, we propose a novel method via Side information and dual-prompts learning for skeleton-based zero-shot action recognition (STAR) at the fine-grained level. Specifically, 1) we decompose the skeleton into several parts based on its topology structure and introduce the side information concerning multi-part descriptions of human body movements for alignment between the skeleton and the semantic space at the fine-grained level; 2) we design the visual-attribute and semantic-part prompts to improve the intra-class compactness within the skeleton space and inter-class separability within the semantic space, respectively, to distinguish the high-similarity actions. Extensive experiments show that our method achieves state-of-the-art performance in ZSL and GZSL settings on NTU RGB+D, NTU RGB+D 120, and PKU-MMD datasets.
Related papers
- Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition [64.56321246196859]
We propose a novel dyNamically Evolving dUal skeleton-semantic syneRgistic framework.
We first construct the spatial-temporal evolving micro-prototypes and integrate dynamic context-aware side information.
We introduce the spatial compression and temporal memory mechanisms to guide the growth of spatial-temporal micro-prototypes.
arXiv Detail & Related papers (2024-11-18T05:16:11Z) - Zero-Shot Skeleton-based Action Recognition with Dual Visual-Text Alignment [11.72557768532557]
Key to zero-shot action recognition lies in aligning visual features with semantic vectors representing action categories.
Our approach achieves state-of-the-art performances on several popular zero-shot skeleton-based action recognition benchmarks.
arXiv Detail & Related papers (2024-09-22T06:44:58Z) - Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition [57.97930719585095]
We introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales.
Our approach is evaluated on various skeleton/language backbones and three large-scale datasets.
The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains.
arXiv Detail & Related papers (2024-06-19T08:22:32Z) - An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning [20.34477942813382]
Skeleton-based action representation learning aims to interpret and understand human behaviors by encoding the skeleton sequences.
We introduce a novel skeleton-based training framework based on Cross-modal Contrastive learning.
Our method outperforms the previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-05-31T03:40:15Z) - Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer.
Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion.
For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z) - Spatial-Temporal Decoupling Contrastive Learning for Skeleton-based
Human Action Recognition [10.403751563214113]
STD-CL is a framework to obtain discriminative and semantically distinct representations from the sequences.
STD-CL achieves solid improvements on NTU60, NTU120, and NW-UCLA benchmarks.
arXiv Detail & Related papers (2023-12-23T02:54:41Z) - Zero-shot Skeleton-based Action Recognition via Mutual Information
Estimation and Maximization [26.721082316870532]
Zero-shot skeleton-based action recognition aims to recognize actions of unseen categories after training on data of seen categories.
We propose a new zero-shot skeleton-based action recognition method via mutual information (MI) estimation and estimation.
arXiv Detail & Related papers (2023-08-07T23:41:55Z) - Part-aware Prototypical Graph Network for One-shot Skeleton-based Action
Recognition [57.86960990337986]
One-shot skeleton-based action recognition poses unique challenges in learning transferable representation from base classes to novel classes.
We propose a part-aware prototypical representation for one-shot skeleton-based action recognition.
We demonstrate the effectiveness of our method on two public skeleton-based action recognition datasets.
arXiv Detail & Related papers (2022-08-19T04:54:56Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.