Part Aware Contrastive Learning for Self-Supervised Action Recognition
- URL: http://arxiv.org/abs/2305.00666v2
- Date: Thu, 11 May 2023 07:26:18 GMT
- Title: Part Aware Contrastive Learning for Self-Supervised Action Recognition
- Authors: Yilei Hua, Wenhan Wu, Ce Zheng, Aidong Lu, Mengyuan Liu, Chen Chen,
Shiqian Wu
- Abstract summary: This paper proposes an attention-based contrastive learning framework for skeleton representation learning, called SkeAttnCLR.
Our proposed SkeAttnCLR outperforms state-of-the-art methods on NTURGB+D, NTU120-RGB+D, and PKU-MMD datasets.
- Score: 18.423841093299135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, remarkable results have been achieved in self-supervised
action recognition using skeleton sequences with contrastive learning. It has
been observed that the semantic distinction of human action features is often
represented by local body parts, such as legs or hands, which are advantageous
for skeleton-based action recognition. This paper proposes an attention-based
contrastive learning framework for skeleton representation learning, called
SkeAttnCLR, which integrates local similarity and global features for
skeleton-based action representations. To achieve this, a multi-head attention
mask module is employed to learn the soft attention mask features from the
skeletons, suppressing non-salient local features while accentuating local
salient features, thereby bringing similar local features closer in the feature
space. Additionally, ample contrastive pairs are generated by expanding
contrastive pairs based on salient and non-salient features with global
features, which guide the network to learn the semantic representations of the
entire skeleton. Therefore, with the attention mask mechanism, SkeAttnCLR
learns local features under different data augmentation views. The experiment
results demonstrate that the inclusion of local feature similarity
significantly enhances skeleton-based action representation. Our proposed
SkeAttnCLR outperforms state-of-the-art methods on NTURGB+D, NTU120-RGB+D, and
PKU-MMD datasets.
Related papers
- Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition [57.97930719585095]
We introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales.
Our approach is evaluated on various skeleton/language backbones and three large-scale datasets.
The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains.
arXiv Detail & Related papers (2024-06-19T08:22:32Z) - An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Self-supervised Action Representation Learning from Partial
Spatio-Temporal Skeleton Sequences [29.376328807860993]
We propose a Partial Spatio-Temporal Learning (PSTL) framework to exploit the local relationship between different skeleton joints and video frames.
Our method achieves state-of-the-art performance on NTURGB+D 60, NTURGBMM+D 120 and PKU-D under various downstream tasks.
arXiv Detail & Related papers (2023-02-17T17:35:05Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Kinship Verification Based on Cross-Generation Feature Interaction
Learning [53.62256887837659]
Kinship verification from facial images has been recognized as an emerging yet challenging technique in computer vision applications.
We propose a novel cross-generation feature interaction learning (CFIL) framework for robust kinship verification.
arXiv Detail & Related papers (2021-09-07T01:50:50Z) - UNIK: A Unified Framework for Real-world Skeleton-based Action
Recognition [11.81043814295441]
We introduce UNIK, a novel skeleton-based action recognition method that is able to generalize across datasets.
To study the cross-domain generalizability of action recognition in real-world videos, we re-evaluate state-of-the-art approaches as well as the proposed UNIK.
Results show that the proposed UNIK, with pre-training on Posetics, generalizes well and outperforms state-of-the-art when transferred onto four target action classification datasets.
arXiv Detail & Related papers (2021-07-19T02:00:28Z) - HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based
Gesture Recognition [73.64451471862613]
We propose an efficient hierarchical self-attention network (HAN) for skeleton-based gesture recognition.
Joint self-attention module is used to capture spatial features of fingers, the finger self-attention module is designed to aggregate features of the whole hand.
Experiments show that our method achieves competitive results on three gesture recognition datasets with much lower computational complexity.
arXiv Detail & Related papers (2021-06-25T02:15:53Z) - A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification [65.18004601366066]
Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages.
This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID.
arXiv Detail & Related papers (2020-09-05T16:06:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.