One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton
Matching
- URL: http://arxiv.org/abs/2307.07286v2
- Date: Tue, 6 Feb 2024 12:02:40 GMT
- Title: One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton
Matching
- Authors: Siyuan Yang, Jun Liu, Shijian Lu, Er Meng Hwa, Alex C. Kot
- Abstract summary: One-shot skeleton action recognition aims to learn a skeleton action recognition model with a single training sample.
This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching.
- Score: 77.6989219290789
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One-shot skeleton action recognition, which aims to learn a skeleton action
recognition model with a single training sample, has attracted increasing
interest due to the challenge of collecting and annotating large-scale skeleton
action data. However, most existing studies match skeleton sequences by
comparing their feature vectors directly which neglects spatial structures and
temporal orders of skeleton data. This paper presents a novel one-shot skeleton
action recognition technique that handles skeleton action recognition via
multi-scale spatial-temporal feature matching. We represent skeleton data at
multiple spatial and temporal scales and achieve optimal feature matching from
two perspectives. The first is multi-scale matching which captures the
scale-wise semantic relevance of skeleton data at multiple spatial and temporal
scales simultaneously. The second is cross-scale matching which handles
different motion magnitudes and speeds by capturing sample-wise relevance
across multiple scales. Extensive experiments over three large-scale datasets
(NTU RGB+D, NTU RGB+D 120, and PKU-MMD) show that our method achieves superior
one-shot skeleton action recognition, and it outperforms the state-of-the-art
consistently by large margins.
Related papers
- SkeleTR: Towrads Skeleton-based Action Recognition in the Wild [86.03082891242698]
SkeleTR is a new framework for skeleton-based action recognition.
It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions.
It then uses stacked Transformer encoders to capture person interactions that are important for action recognition in general scenarios.
arXiv Detail & Related papers (2023-09-20T16:22:33Z) - Self-supervised Action Representation Learning from Partial
Spatio-Temporal Skeleton Sequences [29.376328807860993]
We propose a Partial Spatio-Temporal Learning (PSTL) framework to exploit the local relationship between different skeleton joints and video frames.
Our method achieves state-of-the-art performance on NTURGB+D 60, NTURGBMM+D 120 and PKU-D under various downstream tasks.
arXiv Detail & Related papers (2023-02-17T17:35:05Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Contrastive Learning from Spatio-Temporal Mixed Skeleton Sequences for
Self-Supervised Skeleton-Based Action Recognition [21.546894064451898]
We show that directly extending contrastive pairs based on normal augmentations brings limited returns in terms of performance.
We propose SkeleMixCLR: a contrastive learning framework with atemporal skeleton mixing augmentation (SkeleMix) to complement current contrastive learning approaches.
arXiv Detail & Related papers (2022-07-07T03:18:09Z) - Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based
Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification.
We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions.
We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z) - Skeleton-Contrastive 3D Action Representation Learning [35.06361753065124]
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition.
Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets.
arXiv Detail & Related papers (2021-08-08T14:44:59Z) - Revisiting Skeleton-based Action Recognition [107.08112310075114]
PoseC3D is a new approach to skeleton-based action recognition, which relies on a 3D heatmap instead stack a graph sequence as the base representation of human skeletons.
On four challenging datasets, PoseC3D consistently obtains superior performance, when used alone on skeletons and in combination with the RGB modality.
arXiv Detail & Related papers (2021-04-28T06:32:17Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.