Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM
for Unsupervised Action Recognition
- URL: http://arxiv.org/abs/2008.00188v4
- Date: Fri, 2 Apr 2021 08:14:45 GMT
- Title: Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM
for Unsupervised Action Recognition
- Authors: Haocong Rao, Shihao Xu, Xiping Hu, Jun Cheng, Bin Hu
- Abstract summary: Action recognition via 3D skeleton data is an emerging important topic in these years.
In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL.
Our approach typically improves existing hand-crafted methods by 10-50% top-1 accuracy.
- Score: 16.22360992454675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Action recognition via 3D skeleton data is an emerging important topic in
these years. Most existing methods either extract hand-crafted descriptors or
learn action representations by supervised learning paradigms that require
massive labeled data. In this paper, we for the first time propose a
contrastive action learning paradigm named AS-CAL that can leverage different
augmentations of unlabeled skeleton data to learn action representations in an
unsupervised manner. Specifically, we first propose to contrast similarity
between augmented instances (query and key) of the input skeleton sequence,
which are transformed by multiple novel augmentation strategies, to learn
inherent action patterns ("pattern-invariance") of different skeleton
transformations. Second, to encourage learning the pattern-invariance with more
consistent action representations, we propose a momentum LSTM, which is
implemented as the momentum-based moving average of LSTM based query encoder,
to encode long-term action dynamics of the key sequence. Third, we introduce a
queue to store the encoded keys, which allows our model to flexibly reuse
proceeding keys and build a more consistent dictionary to improve contrastive
learning. Last, by temporally averaging the hidden states of action learned by
the query encoder, a novel representation named Contrastive Action Encoding
(CAE) is proposed to represent human's action effectively. Extensive
experiments show that our approach typically improves existing hand-crafted
methods by 10-50% top-1 accuracy, and it can achieve comparable or even
superior performance to numerous supervised learning methods.
Related papers
- Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning [20.34477942813382]
Skeleton-based action representation learning aims to interpret and understand human behaviors by encoding the skeleton sequences.
We introduce a novel skeleton-based training framework based on Cross-modal Contrastive learning.
Our method outperforms the previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-05-31T03:40:15Z) - ReconBoost: Boosting Can Achieve Modality Reconcilement [89.4377895465204]
We study the modality-alternating learning paradigm to achieve reconcilement.
We propose a new method called ReconBoost to update a fixed modality each time.
We show that the proposed method resembles Friedman's Gradient-Boosting (GB) algorithm, where the updated learner can correct errors made by others.
arXiv Detail & Related papers (2024-05-15T13:22:39Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - KOPPA: Improving Prompt-based Continual Learning with Key-Query
Orthogonal Projection and Prototype-based One-Versus-All [26.506535205897443]
We introduce a novel key-query learning strategy to enhance prompt matching efficiency and address the challenge of shifting features.
Our method empowers the model to achieve results surpassing those of current state-of-the-art approaches by a large margin of up to 20%.
arXiv Detail & Related papers (2023-11-26T20:35:19Z) - Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D
Action Representation Learning [33.68311764817763]
We propose Prompted Contrast with Masked Motion Modeling, PCM$rm 3$, for versatile 3D action representation learning.
Our method integrates the contrastive learning and masked prediction tasks in a mutually beneficial manner.
Tests on five downstream tasks under three large-scale datasets are conducted, demonstrating the superior generalization capacity of PCM$rm 3$ compared to the state-of-the-art works.
arXiv Detail & Related papers (2023-08-08T01:27:55Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - SimMC: Simple Masked Contrastive Learning of Skeleton Representations
for Unsupervised Person Re-Identification [63.903237777588316]
We present a generic Simple Masked Contrastive learning (SimMC) framework to learn effective representations from unlabeled 3D skeletons for person re-ID.
Specifically, to fully exploit skeleton features within each skeleton sequence, we first devise a masked prototype contrastive learning (MPC) scheme.
Then, we propose the masked intra-sequence contrastive learning (MIC) to capture intra-sequence pattern consistency between subsequences.
arXiv Detail & Related papers (2022-04-21T00:19:38Z) - Improving Contrastive Learning with Model Augmentation [123.05700988581806]
The sequential recommendation aims at predicting the next items in user behaviors, which can be solved by characterizing item relationships in sequences.
Due to the data sparsity and noise issues in sequences, a new self-supervised learning (SSL) paradigm is proposed to improve the performance.
arXiv Detail & Related papers (2022-03-25T06:12:58Z) - ProFormer: Learning Data-efficient Representations of Body Movement with
Prototype-based Feature Augmentation and Visual Transformers [31.908276711898548]
Methods for data-efficient recognition from body poses increasingly leverage skeleton sequences structured as image-like arrays.
We look at this paradigm from the perspective of transformer networks, for the first time exploring visual transformers as data-efficient encoders of skeleton movement.
In our pipeline, body pose sequences cast as image-like representations are converted into patch embeddings and then passed to a visual transformer backbone optimized with deep metric learning.
arXiv Detail & Related papers (2022-02-23T11:11:54Z) - Contrastively Disentangled Sequential Variational Autoencoder [20.75922928324671]
We propose a novel sequence representation learning method, named Contrastively Disentangled Sequential Variational Autoencoder (C-DSVAE)
We use a novel evidence lower bound which maximizes the mutual information between the input and the latent factors, while penalizes the mutual information between the static and dynamic factors.
Our experiments show that C-DSVAE significantly outperforms the previous state-of-the-art methods on multiple metrics.
arXiv Detail & Related papers (2021-10-22T23:00:32Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.