Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based
Action Recognition
- URL: http://arxiv.org/abs/2305.02324v2
- Date: Thu, 26 Oct 2023 03:38:48 GMT
- Title: Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based
Action Recognition
- Authors: Ding Li and Yongqiang Tang and Zhizhong Zhang and Wensheng Zhang
- Abstract summary: Self-supervised skeleton-based action recognition enjoys a rapid growth along with the development of contrastive learning.
We propose a Cross-Stream Contrastive Learning framework for skeleton-based action Representation learning (CSCLR)
Specifically, the proposed CSCLR not only utilizes intra-stream contrast pairs, but introduces inter-stream contrast pairs as hard samples to formulate a better representation learning.
- Score: 22.067143671631303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised skeleton-based action recognition enjoys a rapid growth along
with the development of contrastive learning. The existing methods rely on
imposing invariance to augmentations of 3D skeleton within a single data
stream, which merely leverages the easy positive pairs and limits the ability
to explore the complicated movement patterns. In this paper, we advocate that
the defect of single-stream contrast and the lack of necessary feature
transformation are responsible for easy positives, and therefore propose a
Cross-Stream Contrastive Learning framework for skeleton-based action
Representation learning (CSCLR). Specifically, the proposed CSCLR not only
utilizes intra-stream contrast pairs, but introduces inter-stream contrast
pairs as hard samples to formulate a better representation learning. Besides,
to further exploit the potential of positive pairs and increase the robustness
of self-supervised representation learning, we propose a Positive Feature
Transformation (PFT) strategy which adopts feature-level manipulation to
increase the variance of positive pairs. To validate the effectiveness of our
method, we conduct extensive experiments on three benchmark datasets NTU-RGB+D
60, NTU-RGB+D 120 and PKU-MMD. Experimental results show that our proposed
CSCLR exceeds the state-of-the-art methods on a diverse range of evaluation
protocols.
Related papers
- Contrastive Learning Via Equivariant Representation [19.112460889771423]
We propose CLeVER, a novel equivariant contrastive learning framework compatible with augmentation strategies of arbitrary complexity.
Experimental results demonstrate that CLeVER effectively extracts and incorporates equivariant information from practical natural images.
arXiv Detail & Related papers (2024-06-01T01:53:51Z) - CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective [48.99488315273868]
We present a contrastive knowledge distillation approach, which can be formulated as a sample-wise alignment problem with intra- and inter-sample constraints.
Our method minimizes logit differences within the same sample by considering their numerical values.
We conduct comprehensive experiments on three datasets including CIFAR-100, ImageNet-1K, and MS COCO.
arXiv Detail & Related papers (2024-04-22T11:52:40Z) - Relaxed Contrastive Learning for Federated Learning [48.96253206661268]
We propose a novel contrastive learning framework to address the challenges of data heterogeneity in federated learning.
Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks.
arXiv Detail & Related papers (2024-01-10T04:55:24Z) - Cross-Model Cross-Stream Learning for Self-Supervised Human Action Recognition [19.86316311525552]
This paper first applies a new contrastive learning method called BYOL to learn from skeleton data.
Inspired by SkeletonBYOL, this paper further presents a Cross-Model and Cross-Stream framework.
arXiv Detail & Related papers (2023-07-15T12:37:18Z) - Hierarchical Consistent Contrastive Learning for Skeleton-Based Action
Recognition with Growing Augmentations [33.68311764817763]
We propose a general hierarchical consistent contrastive learning framework (HiCLR) for skeleton-based action recognition.
Specifically, we first design a gradual growing augmentation policy to generate multiple ordered positive pairs.
Then, an asymmetric loss is proposed to enforce the hierarchical consistency via a directional clustering operation.
arXiv Detail & Related papers (2022-11-24T08:09:50Z) - PointACL:Adversarial Contrastive Learning for Robust Point Clouds
Representation under Adversarial Attack [73.3371797787823]
Adversarial contrastive learning (ACL) is considered an effective way to improve the robustness of pre-trained models.
We present our robust aware loss function to train self-supervised contrastive learning framework adversarially.
We validate our method, PointACL on downstream tasks, including 3D classification and 3D segmentation with multiple datasets.
arXiv Detail & Related papers (2022-09-14T22:58:31Z) - R\'enyiCL: Contrastive Representation Learning with Skew R\'enyi
Divergence [78.15455360335925]
We present a new robust contrastive learning scheme, coined R'enyiCL, which can effectively manage harder augmentations.
Our method is built upon the variational lower bound of R'enyi divergence.
We show that R'enyi contrastive learning objectives perform innate hard negative sampling and easy positive sampling simultaneously.
arXiv Detail & Related papers (2022-08-12T13:37:05Z) - Contrastive Instruction-Trajectory Learning for Vision-Language
Navigation [66.16980504844233]
A vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.
Previous works fail to discriminate the similarities and discrepancies across instruction-trajectory pairs and ignore the temporal continuity of sub-instructions.
We propose a Contrastive Instruction-Trajectory Learning framework that explores invariance across similar data samples and variance across different ones to learn distinctive representations for robust navigation.
arXiv Detail & Related papers (2021-12-08T06:32:52Z) - Contrastive Learning from Extremely Augmented Skeleton Sequences for
Self-supervised Action Recognition [23.27198457894644]
A Contrastive Learning framework utilizing Abundant Information Mining for self-supervised action Representation (AimCLR) is proposed.
First, the extreme augmentations and the Energy-based Attention-guided Drop Module (EADM) are proposed to obtain diverse positive samples.
Third, the Nearest Neighbors Mining (NNM) is proposed to further expand positive samples to make the abundant information mining process more reasonable.
arXiv Detail & Related papers (2021-12-07T09:38:37Z) - 3D Human Action Representation Learning via Cross-View Consistency
Pursuit [52.19199260960558]
We propose a Cross-view Contrastive Learning framework for unsupervised 3D skeleton-based action Representation (CrosSCLR)
CrosSCLR consists of both single-view contrastive learning (SkeletonCLR) and cross-view consistent knowledge mining (CVC-KM) modules, integrated in a collaborative learning manner.
arXiv Detail & Related papers (2021-04-29T16:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.