Contrast-reconstruction Representation Learning for Self-supervised
Skeleton-based Action Recognition
- URL: http://arxiv.org/abs/2111.11051v1
- Date: Mon, 22 Nov 2021 08:45:34 GMT
- Title: Contrast-reconstruction Representation Learning for Self-supervised
Skeleton-based Action Recognition
- Authors: Peng Wang, Jun Wen, Chenyang Si, Yuntao Qian, Liang Wang
- Abstract summary: We propose a novel Contrast-Reconstruction Representation Learning network (CRRL)
It simultaneously captures postures and motion dynamics for unsupervised skeleton-based action recognition.
Experimental results on several benchmarks, i.e., NTU RGB+D 60, NTU RGB+D 120, CMU mocap, and NW-UCLA, demonstrate the promise of the proposed CRRL method.
- Score: 18.667198945509114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Skeleton-based action recognition is widely used in varied areas, e.g.,
surveillance and human-machine interaction. Existing models are mainly learned
in a supervised manner, thus heavily depending on large-scale labeled data
which could be infeasible when labels are prohibitively expensive. In this
paper, we propose a novel Contrast-Reconstruction Representation Learning
network (CRRL) that simultaneously captures postures and motion dynamics for
unsupervised skeleton-based action recognition. It mainly consists of three
parts: Sequence Reconstructor, Contrastive Motion Learner, and Information
Fuser. The Sequence Reconstructor learns representation from skeleton
coordinate sequence via reconstruction, thus the learned representation tends
to focus on trivial postural coordinates and be hesitant in motion learning. To
enhance the learning of motions, the Contrastive Motion Learner performs
contrastive learning between the representations learned from coordinate
sequence and additional velocity sequence, respectively. Finally, in the
Information Fuser, we explore varied strategies to combine the Sequence
Reconstructor and Contrastive Motion Learner, and propose to capture postures
and motions simultaneously via a knowledge-distillation based fusion strategy
that transfers the motion learning from the Contrastive Motion Learner to the
Sequence Reconstructor. Experimental results on several benchmarks, i.e., NTU
RGB+D 60, NTU RGB+D 120, CMU mocap, and NW-UCLA, demonstrate the promise of the
proposed CRRL method by far outperforming state-of-the-art approaches.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - SCD-Net: Spatiotemporal Clues Disentanglement Network for
Self-supervised Skeleton-based Action Recognition [39.99711066167837]
This paper introduces a contrastive learning framework, namely Stemporal Clues Disentanglement Network (SCD-Net)
Specifically, we integrate the sequences with a feature extractor to derive explicit clues from spatial and temporal domains respectively.
We conduct evaluations on the NTU-+D (60&120) PKU-MMDI (&I) datasets, covering various downstream tasks such as action recognition, action retrieval, transfer learning.
arXiv Detail & Related papers (2023-09-11T21:32:13Z) - Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based
Action Recognition [22.067143671631303]
Self-supervised skeleton-based action recognition enjoys a rapid growth along with the development of contrastive learning.
We propose a Cross-Stream Contrastive Learning framework for skeleton-based action Representation learning (CSCLR)
Specifically, the proposed CSCLR not only utilizes intra-stream contrast pairs, but introduces inter-stream contrast pairs as hard samples to formulate a better representation learning.
arXiv Detail & Related papers (2023-05-03T10:31:35Z) - Hierarchical Consistent Contrastive Learning for Skeleton-Based Action
Recognition with Growing Augmentations [33.68311764817763]
We propose a general hierarchical consistent contrastive learning framework (HiCLR) for skeleton-based action recognition.
Specifically, we first design a gradual growing augmentation policy to generate multiple ordered positive pairs.
Then, an asymmetric loss is proposed to enforce the hierarchical consistency via a directional clustering operation.
arXiv Detail & Related papers (2022-11-24T08:09:50Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - 3D Human Action Representation Learning via Cross-View Consistency
Pursuit [52.19199260960558]
We propose a Cross-view Contrastive Learning framework for unsupervised 3D skeleton-based action Representation (CrosSCLR)
CrosSCLR consists of both single-view contrastive learning (SkeletonCLR) and cross-view consistent knowledge mining (CVC-KM) modules, integrated in a collaborative learning manner.
arXiv Detail & Related papers (2021-04-29T16:29:41Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - SeCo: Exploring Sequence Supervision for Unsupervised Representation
Learning [114.58986229852489]
In this paper, we explore the basic and generic supervision in the sequence from spatial, sequential and temporal perspectives.
We derive a particular form named Contrastive Learning (SeCo)
SeCo shows superior results under the linear protocol on action recognition, untrimmed activity recognition and object tracking.
arXiv Detail & Related papers (2020-08-03T15:51:35Z) - Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM
for Unsupervised Action Recognition [16.22360992454675]
Action recognition via 3D skeleton data is an emerging important topic in these years.
In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL.
Our approach typically improves existing hand-crafted methods by 10-50% top-1 accuracy.
arXiv Detail & Related papers (2020-08-01T06:37:57Z) - Hierarchical Contrastive Motion Learning for Video Action Recognition [100.9807616796383]
We present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw video frames.
Our approach progressively learns a hierarchy of motion features that correspond to different abstraction levels in a network.
Our motion learning module is lightweight and flexible to be embedded into various backbone networks.
arXiv Detail & Related papers (2020-07-20T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.