MS-CLR: Multi-Skeleton Contrastive Learning for Human Action Recognition
- URL: http://arxiv.org/abs/2508.14889v1
- Date: Wed, 20 Aug 2025 17:58:03 GMT
- Title: MS-CLR: Multi-Skeleton Contrastive Learning for Human Action Recognition
- Authors: Mert Kiray, Alvaro Ritter, Nassir Navab, Benjamin Busam,
- Abstract summary: Multi-Skeleton Contrastive Learning (MS-CLR) is a framework that aligns pose representations across multiple skeleton conventions extracted from the same sequence.<n>MS-CLR consistently improves performance over strong single-skeleton contrastive learning baselines.<n>A multi-skeleton ensemble further boosts performance, setting new state-of-the-art results on both datasets.
- Score: 49.91188543847175
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Contrastive learning has gained significant attention in skeleton-based action recognition for its ability to learn robust representations from unlabeled data. However, existing methods rely on a single skeleton convention, which limits their ability to generalize across datasets with diverse joint structures and anatomical coverage. We propose Multi-Skeleton Contrastive Learning (MS-CLR), a general self-supervised framework that aligns pose representations across multiple skeleton conventions extracted from the same sequence. This encourages the model to learn structural invariances and capture diverse anatomical cues, resulting in more expressive and generalizable features. To support this, we adapt the ST-GCN architecture to handle skeletons with varying joint layouts and scales through a unified representation scheme. Experiments on the NTU RGB+D 60 and 120 datasets demonstrate that MS-CLR consistently improves performance over strong single-skeleton contrastive learning baselines. A multi-skeleton ensemble further boosts performance, setting new state-of-the-art results on both datasets.
Related papers
- Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification [60.939250172443586]
Person re-identification (re-ID) via 3D skeleton data is a challenging task with significant value in many scenarios.<n>Existing skeleton-based methods typically assume virtual motion relations between all joints, and adopt average joint or sequence representations for learning.<n>This paper presents a generic Motif guided graph transformer with Combinatorial skeleton prototype learning (MoCos)<n>MoCos exploits structure-specific and gait-related body relations as well as features of skeleton graphs to learn effective skeleton representations for person re-ID.
arXiv Detail & Related papers (2024-12-12T08:13:29Z) - Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling [58.50618448027103]
Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning.<n>This paper explores the differences across various CLIP-trained vision backbones.<n>Method achieves a remarkable increase in accuracy of up to 39.1% over the best single backbone.
arXiv Detail & Related papers (2024-05-27T12:59:35Z) - SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
Pre-training [110.55093254677638]
We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL)
In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE.
Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
arXiv Detail & Related papers (2023-07-17T13:33:11Z) - Skeleton Prototype Contrastive Learning with Multi-Level Graph Relation
Modeling for Unsupervised Person Re-Identification [63.903237777588316]
Person re-identification (re-ID) via 3D skeletons is an important emerging topic with many merits.
Existing solutions rarely explore valuable body-component relations in skeletal structure or motion.
This paper proposes a generic unsupervised Prototype Contrastive learning paradigm with Multi-level Graph Relation learning.
arXiv Detail & Related papers (2022-08-25T00:59:32Z) - Contrastive Learning from Spatio-Temporal Mixed Skeleton Sequences for
Self-Supervised Skeleton-Based Action Recognition [21.546894064451898]
We show that directly extending contrastive pairs based on normal augmentations brings limited returns in terms of performance.
We propose SkeleMixCLR: a contrastive learning framework with atemporal skeleton mixing augmentation (SkeleMix) to complement current contrastive learning approaches.
arXiv Detail & Related papers (2022-07-07T03:18:09Z) - SimMC: Simple Masked Contrastive Learning of Skeleton Representations
for Unsupervised Person Re-Identification [63.903237777588316]
We present a generic Simple Masked Contrastive learning (SimMC) framework to learn effective representations from unlabeled 3D skeletons for person re-ID.
Specifically, to fully exploit skeleton features within each skeleton sequence, we first devise a masked prototype contrastive learning (MPC) scheme.
Then, we propose the masked intra-sequence contrastive learning (MIC) to capture intra-sequence pattern consistency between subsequences.
arXiv Detail & Related papers (2022-04-21T00:19:38Z) - Skeleton-Contrastive 3D Action Representation Learning [35.06361753065124]
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition.
Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets.
arXiv Detail & Related papers (2021-08-08T14:44:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.