Hierarchical Consistent Contrastive Learning for Skeleton-Based Action
Recognition with Growing Augmentations
- URL: http://arxiv.org/abs/2211.13466v3
- Date: Mon, 10 Jul 2023 10:48:24 GMT
- Title: Hierarchical Consistent Contrastive Learning for Skeleton-Based Action
Recognition with Growing Augmentations
- Authors: Jiahang Zhang, Lilang Lin, Jiaying Liu
- Abstract summary: We propose a general hierarchical consistent contrastive learning framework (HiCLR) for skeleton-based action recognition.
Specifically, we first design a gradual growing augmentation policy to generate multiple ordered positive pairs.
Then, an asymmetric loss is proposed to enforce the hierarchical consistency via a directional clustering operation.
- Score: 33.68311764817763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning has been proven beneficial for self-supervised
skeleton-based action recognition. Most contrastive learning methods utilize
carefully designed augmentations to generate different movement patterns of
skeletons for the same semantics. However, it is still a pending issue to apply
strong augmentations, which distort the images/skeletons' structures and cause
semantic loss, due to their resulting unstable training. In this paper, we
investigate the potential of adopting strong augmentations and propose a
general hierarchical consistent contrastive learning framework (HiCLR) for
skeleton-based action recognition. Specifically, we first design a gradual
growing augmentation policy to generate multiple ordered positive pairs, which
guide to achieve the consistency of the learned representation from different
views. Then, an asymmetric loss is proposed to enforce the hierarchical
consistency via a directional clustering operation in the feature space,
pulling the representations from strongly augmented views closer to those from
weakly augmented views for better generalizability. Meanwhile, we propose and
evaluate three kinds of strong augmentations for 3D skeletons to demonstrate
the effectiveness of our method. Extensive experiments show that HiCLR
outperforms the state-of-the-art methods notably on three large-scale datasets,
i.e., NTU60, NTU120, and PKUMMD.
Related papers
- Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based
Action Recognition [22.067143671631303]
Self-supervised skeleton-based action recognition enjoys a rapid growth along with the development of contrastive learning.
We propose a Cross-Stream Contrastive Learning framework for skeleton-based action Representation learning (CSCLR)
Specifically, the proposed CSCLR not only utilizes intra-stream contrast pairs, but introduces inter-stream contrast pairs as hard samples to formulate a better representation learning.
arXiv Detail & Related papers (2023-05-03T10:31:35Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - Contrastive Learning from Spatio-Temporal Mixed Skeleton Sequences for
Self-Supervised Skeleton-Based Action Recognition [21.546894064451898]
We show that directly extending contrastive pairs based on normal augmentations brings limited returns in terms of performance.
We propose SkeleMixCLR: a contrastive learning framework with atemporal skeleton mixing augmentation (SkeleMix) to complement current contrastive learning approaches.
arXiv Detail & Related papers (2022-07-07T03:18:09Z) - Strongly Augmented Contrastive Clustering [52.00792661612913]
We present an end-to-end deep clustering approach termed strongly augmented contrastive clustering (SACC)
We utilize a backbone network with triply-shared weights, where a strongly augmented view and two weakly augmented views are incorporated.
Based on the representations produced by the backbone, the weak-weak view pair and the strong-weak view pairs are simultaneously exploited for the instance-level contrastive learning.
arXiv Detail & Related papers (2022-06-01T10:30:59Z) - SimMC: Simple Masked Contrastive Learning of Skeleton Representations
for Unsupervised Person Re-Identification [63.903237777588316]
We present a generic Simple Masked Contrastive learning (SimMC) framework to learn effective representations from unlabeled 3D skeletons for person re-ID.
Specifically, to fully exploit skeleton features within each skeleton sequence, we first devise a masked prototype contrastive learning (MPC) scheme.
Then, we propose the masked intra-sequence contrastive learning (MIC) to capture intra-sequence pattern consistency between subsequences.
arXiv Detail & Related papers (2022-04-21T00:19:38Z) - Improving Contrastive Learning with Model Augmentation [123.05700988581806]
The sequential recommendation aims at predicting the next items in user behaviors, which can be solved by characterizing item relationships in sequences.
Due to the data sparsity and noise issues in sequences, a new self-supervised learning (SSL) paradigm is proposed to improve the performance.
arXiv Detail & Related papers (2022-03-25T06:12:58Z) - Contrastive Learning from Extremely Augmented Skeleton Sequences for
Self-supervised Action Recognition [23.27198457894644]
A Contrastive Learning framework utilizing Abundant Information Mining for self-supervised action Representation (AimCLR) is proposed.
First, the extreme augmentations and the Energy-based Attention-guided Drop Module (EADM) are proposed to obtain diverse positive samples.
Third, the Nearest Neighbors Mining (NNM) is proposed to further expand positive samples to make the abundant information mining process more reasonable.
arXiv Detail & Related papers (2021-12-07T09:38:37Z) - Contrast-reconstruction Representation Learning for Self-supervised
Skeleton-based Action Recognition [18.667198945509114]
We propose a novel Contrast-Reconstruction Representation Learning network (CRRL)
It simultaneously captures postures and motion dynamics for unsupervised skeleton-based action recognition.
Experimental results on several benchmarks, i.e., NTU RGB+D 60, NTU RGB+D 120, CMU mocap, and NW-UCLA, demonstrate the promise of the proposed CRRL method.
arXiv Detail & Related papers (2021-11-22T08:45:34Z) - A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification [65.18004601366066]
Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages.
This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID.
arXiv Detail & Related papers (2020-09-05T16:06:04Z) - Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM
for Unsupervised Action Recognition [16.22360992454675]
Action recognition via 3D skeleton data is an emerging important topic in these years.
In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL.
Our approach typically improves existing hand-crafted methods by 10-50% top-1 accuracy.
arXiv Detail & Related papers (2020-08-01T06:37:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.