PointCMP: Contrastive Mask Prediction for Self-supervised Learning on
Point Cloud Videos
- URL: http://arxiv.org/abs/2305.04075v1
- Date: Sat, 6 May 2023 15:47:48 GMT
- Title: PointCMP: Contrastive Mask Prediction for Self-supervised Learning on
Point Cloud Videos
- Authors: Zhiqiang Shen, Xiaoxiao Sheng, Longguang Wang, Yulan Guo, Qiong Liu,
Xi Zhou
- Abstract summary: We propose a contrastive mask prediction framework for self-supervised learning on point cloud videos.
PointCMP employs a two-branch structure to achieve simultaneous learning of both local and globaltemporal information.
Our framework achieves the state-of-the-art performance on benchmark datasets and outperforms existing full-supervised counterparts.
- Score: 58.18707835387484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning can extract representations of good quality from
solely unlabeled data, which is appealing for point cloud videos due to their
high labelling cost. In this paper, we propose a contrastive mask prediction
(PointCMP) framework for self-supervised learning on point cloud videos.
Specifically, our PointCMP employs a two-branch structure to achieve
simultaneous learning of both local and global spatio-temporal information. On
top of this two-branch structure, a mutual similarity based augmentation module
is developed to synthesize hard samples at the feature level. By masking
dominant tokens and erasing principal channels, we generate hard samples to
facilitate learning representations with better discrimination and
generalization performance. Extensive experiments show that our PointCMP
achieves the state-of-the-art performance on benchmark datasets and outperforms
existing full-supervised counterparts. Transfer learning results demonstrate
the superiority of the learned representations across different datasets and
tasks.
Related papers
- Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Point Contrastive Prediction with Semantic Clustering for
Self-Supervised Learning on Point Cloud Videos [71.20376514273367]
We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data.
Our method outperforms supervised counterparts on a wide range of downstream tasks.
arXiv Detail & Related papers (2023-08-18T02:17:47Z) - DenseDINO: Boosting Dense Self-Supervised Learning with Token-Based
Point-Level Consistency [12.881617910150688]
We propose a transformer framework for self-supervised learning called DenseDINO to learn dense visual representations.
Specifically, DenseDINO introduces some extra input tokens called reference tokens to match the point-level features with the position prior.
Compared with the vanilla DINO, our approach obtains competitive performance when evaluated on classification in ImageNet.
arXiv Detail & Related papers (2023-06-06T15:04:45Z) - Contrastive Predictive Autoencoders for Dynamic Point Cloud
Self-Supervised Learning [26.773995001469505]
We design point cloud sequence based Contrastive Prediction and Reconstruction (CPR), to collaboratively learn more comprehensive representations.
We conduct experiments on four point cloud sequence benchmarks, and report the results under multiple experimental settings.
arXiv Detail & Related papers (2023-05-22T12:09:51Z) - Point2Vec for Self-Supervised Representation Learning on Point Clouds [66.53955515020053]
We extend data2vec to the point cloud domain and report encouraging results on several downstream tasks.
We propose point2vec, which unleashes the full potential of data2vec-like pre-training on point clouds.
arXiv Detail & Related papers (2023-03-29T10:08:29Z) - Robust Representation Learning by Clustering with Bisimulation Metrics
for Visual Reinforcement Learning with Distractions [9.088460902782547]
Clustering with Bisimulation Metrics (CBM) learns robust representations by grouping visual observations in the latent space.
CBM alternates between two steps: (1) grouping observations by measuring their bisimulation distances to the learned prototypes; (2) learning a set of prototypes according to the current cluster assignments.
Experiments demonstrate that CBM significantly improves the sample efficiency of popular visual RL algorithms.
arXiv Detail & Related papers (2023-02-12T13:27:34Z) - C2F-TCN: A Framework for Semi and Fully Supervised Temporal Action
Segmentation [20.182928938110923]
Temporal action segmentation tags action labels for every frame in an input untrimmed video containing multiple actions in a sequence.
We propose an encoder-decoder-style architecture named C2F-TCN featuring a "coarse-to-fine" ensemble of decoder outputs.
We show that the architecture is flexible for both supervised and representation learning.
arXiv Detail & Related papers (2022-12-20T14:53:46Z) - MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
Cloud Action Recognition [160.49403075559158]
We propose a Masked Pseudo-Labeling autoEncoder (textbfMAPLE) framework for point cloud action recognition.
In particular, we design a novel and efficient textbfDecoupled textbfspatial-textbftemporal TranstextbfFormer (textbfDestFormer) as the backbone of MAPLE.
MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08% accuracy on the MSR-Action3
arXiv Detail & Related papers (2022-09-01T12:32:40Z) - Unsupervised Representation Learning for 3D Point Cloud Data [66.92077180228634]
We propose a simple yet effective approach for unsupervised point cloud learning.
In particular, we identify a very useful transformation which generates a good contrastive version of an original point cloud.
We conduct experiments on three downstream tasks which are 3D object classification, shape part segmentation and scene segmentation.
arXiv Detail & Related papers (2021-10-13T10:52:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.