Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition
- URL: http://arxiv.org/abs/2406.13327v1
- Date: Wed, 19 Jun 2024 08:22:32 GMT
- Title: Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition
- Authors: Anqi Zhu, Qiuhong Ke, Mingming Gong, James Bailey,
- Abstract summary: We introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales.
Our approach is evaluated on various skeleton/language backbones and three large-scale datasets.
The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains.
- Score: 57.97930719585095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this paper, we argue that relying solely on aligning label-level semantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, we introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. PURLS introduces a new prompting module and a novel partitioning module to generate aligned textual and visual representations across different levels. The former leverages a pre-trained GPT-3 to infer refined descriptions of the global and local (body-part-based and temporal-interval-based) movements from the original action labels. The latter employs an adaptive sampling strategy to group visual features from all body joint movements that are semantically relevant to a given description. Our approach is evaluated on various skeleton/language backbones and three large-scale datasets, i.e., NTU-RGB+D 60, NTU-RGB+D 120, and a newly curated dataset Kinetics-skeleton 200. The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains. The source codes can be accessed at https://github.com/azzh1/PURLS.
Related papers
- Part Aware Contrastive Learning for Self-Supervised Action Recognition [18.423841093299135]
This paper proposes an attention-based contrastive learning framework for skeleton representation learning, called SkeAttnCLR.
Our proposed SkeAttnCLR outperforms state-of-the-art methods on NTURGB+D, NTU120-RGB+D, and PKU-MMD datasets.
arXiv Detail & Related papers (2023-05-01T05:31:48Z) - LANDMARK: Language-guided Representation Enhancement Framework for Scene
Graph Generation [34.40862385518366]
Scene graph generation (SGG) is a sophisticated task that suffers from both complex visual features and dataset longtail problem.
We propose LANDMARK (LANguage-guiDed representationenhanceMent frAmewoRK) that learns predicate-relevant representations from language-vision interactive patterns.
This framework is model-agnostic and consistently improves performance on existing SGG models.
arXiv Detail & Related papers (2023-03-02T09:03:11Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Part-aware Prototypical Graph Network for One-shot Skeleton-based Action
Recognition [57.86960990337986]
One-shot skeleton-based action recognition poses unique challenges in learning transferable representation from base classes to novel classes.
We propose a part-aware prototypical representation for one-shot skeleton-based action recognition.
We demonstrate the effectiveness of our method on two public skeleton-based action recognition datasets.
arXiv Detail & Related papers (2022-08-19T04:54:56Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition.
We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors.
Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z) - UNIK: A Unified Framework for Real-world Skeleton-based Action
Recognition [11.81043814295441]
We introduce UNIK, a novel skeleton-based action recognition method that is able to generalize across datasets.
To study the cross-domain generalizability of action recognition in real-world videos, we re-evaluate state-of-the-art approaches as well as the proposed UNIK.
Results show that the proposed UNIK, with pre-training on Posetics, generalizes well and outperforms state-of-the-art when transferred onto four target action classification datasets.
arXiv Detail & Related papers (2021-07-19T02:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.