Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D
Action Representation Learning
- URL: http://arxiv.org/abs/2308.03975v1
- Date: Tue, 8 Aug 2023 01:27:55 GMT
- Title: Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D
Action Representation Learning
- Authors: Jiahang Zhang, Lilang Lin, Jiaying Liu
- Abstract summary: We propose Prompted Contrast with Masked Motion Modeling, PCM$rm 3$, for versatile 3D action representation learning.
Our method integrates the contrastive learning and masked prediction tasks in a mutually beneficial manner.
Tests on five downstream tasks under three large-scale datasets are conducted, demonstrating the superior generalization capacity of PCM$rm 3$ compared to the state-of-the-art works.
- Score: 33.68311764817763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning has proved effective for skeleton-based human action
understanding, which is an important yet challenging topic. Previous works
mainly rely on contrastive learning or masked motion modeling paradigm to model
the skeleton relations. However, the sequence-level and joint-level
representation learning cannot be effectively and simultaneously handled by
these methods. As a result, the learned representations fail to generalize to
different downstream tasks. Moreover, combining these two paradigms in a naive
manner leaves the synergy between them untapped and can lead to interference in
training. To address these problems, we propose Prompted Contrast with Masked
Motion Modeling, PCM$^{\rm 3}$, for versatile 3D action representation
learning. Our method integrates the contrastive learning and masked prediction
tasks in a mutually beneficial manner, which substantially boosts the
generalization capacity for various downstream tasks. Specifically, masked
prediction provides novel training views for contrastive learning, which in
turn guides the masked prediction training with high-level semantic
information. Moreover, we propose a dual-prompted multi-task pretraining
strategy, which further improves model representations by reducing the
interference caused by learning the two different pretext tasks. Extensive
experiments on five downstream tasks under three large-scale datasets are
conducted, demonstrating the superior generalization capacity of PCM$^{\rm 3}$
compared to the state-of-the-art works. Our project is publicly available at:
https://jhang2020.github.io/Projects/PCM3/PCM3.html .
Related papers
- MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Masked Scene Contrast: A Scalable Framework for Unsupervised 3D
Representation Learning [37.155772047656114]
Masked Scene Contrast (MSC) framework is capable of extracting comprehensive 3D representations more efficiently and effectively.
MSC also enables large-scale 3D pre-training across multiple datasets.
arXiv Detail & Related papers (2023-03-24T17:59:58Z) - Contrast with Reconstruct: Contrastive 3D Representation Learning Guided
by Generative Pretraining [26.908554018069545]
We propose Contrast with Reconstruct (ReCon) that unifies contrastive and generative modeling paradigms.
An encoder-decoder style ReCon-block is proposed that transfers knowledge through cross attention with stop-gradient.
ReCon achieves a new state-of-the-art in 3D representation learning, e.g., 91.26% accuracy on ScanObjectNN.
arXiv Detail & Related papers (2023-02-05T06:58:35Z) - Improving the Modality Representation with Multi-View Contrastive
Learning for Multimodal Sentiment Analysis [15.623293264871181]
This study investigates the improvement approaches of modality representation with contrastive learning.
We devise a three-stages framework with multi-view contrastive learning to refine representations for the specific objectives.
We conduct experiments on three open datasets, and results show the advance of our model.
arXiv Detail & Related papers (2022-10-28T01:25:16Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - PointACL:Adversarial Contrastive Learning for Robust Point Clouds
Representation under Adversarial Attack [73.3371797787823]
Adversarial contrastive learning (ACL) is considered an effective way to improve the robustness of pre-trained models.
We present our robust aware loss function to train self-supervised contrastive learning framework adversarially.
We validate our method, PointACL on downstream tasks, including 3D classification and 3D segmentation with multiple datasets.
arXiv Detail & Related papers (2022-09-14T22:58:31Z) - SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video
Anomaly Detection [108.57862846523858]
We revisit the self-supervised multi-task learning framework, proposing several updates to the original method.
We modernize the 3D convolutional backbone by introducing multi-head self-attention modules.
In our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps.
arXiv Detail & Related papers (2022-07-16T19:25:41Z) - UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes [91.24112204588353]
We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks.
In contrast to previous models, UViM has the same functional form for all tasks.
We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks.
arXiv Detail & Related papers (2022-05-20T17:47:59Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM
for Unsupervised Action Recognition [16.22360992454675]
Action recognition via 3D skeleton data is an emerging important topic in these years.
In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL.
Our approach typically improves existing hand-crafted methods by 10-50% top-1 accuracy.
arXiv Detail & Related papers (2020-08-01T06:37:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.