M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and
Siamese Decoders
- URL: http://arxiv.org/abs/2309.13235v1
- Date: Sat, 23 Sep 2023 02:19:21 GMT
- Title: M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and
Siamese Decoders
- Authors: Qibo Qiu, Honghui Yang, Wenxiao Wang, Shun Zhang, Haiming Gao, Haochao
Ying, Wei Hua, Xiaofei He
- Abstract summary: Masked point modeling has become a promising scheme of self-supervised pre-training for point clouds.
M$3$CS is proposed to enable the model with the above abilities.
- Score: 19.68592678093725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked point modeling has become a promising scheme of self-supervised
pre-training for point clouds. Existing methods reconstruct either the original
points or related features as the objective of pre-training. However,
considering the diversity of downstream tasks, it is necessary for the model to
have both low- and high-level representation modeling capabilities to capture
geometric details and semantic contexts during pre-training. To this end,
M$^3$CS is proposed to enable the model with the above abilities. Specifically,
with masked point cloud as input, M$^3$CS introduces two decoders to predict
masked representations and the original points simultaneously. While an extra
decoder doubles parameters for the decoding process and may lead to
overfitting, we propose siamese decoders to keep the amount of learnable
parameters unchanged. Further, we propose an online codebook projecting
continuous tokens into discrete ones before reconstructing masked points. In
such way, we can enforce the decoder to take effect through the combinations of
tokens rather than remembering each token. Comprehensive experiments show that
M$^3$CS achieves superior performance at both classification and segmentation
tasks, outperforming existing methods.
Related papers
- Triple Point Masking [49.39218611030084]
Existing 3D mask learning methods encounter performance bottlenecks under limited data.
We introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders.
Extensive experiments show that the four baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks.
arXiv Detail & Related papers (2024-09-26T05:33:30Z) - Pre-training Point Cloud Compact Model with Partial-aware Reconstruction [51.403810709250024]
We present a pre-trained Point cloud Compact Model with Partial-aware textbfReconstruction, named Point-CPR.
Our model exhibits strong performance across various tasks, especially surpassing the leading MPM-based model PointGPT-B with only 2% of its parameters.
arXiv Detail & Related papers (2024-07-12T15:18:14Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - TimeMAE: Self-Supervised Representations of Time Series with Decoupled
Masked Autoencoders [55.00904795497786]
We propose TimeMAE, a novel self-supervised paradigm for learning transferrable time series representations based on transformer networks.
The TimeMAE learns enriched contextual representations of time series with a bidirectional encoding scheme.
To solve the discrepancy issue incurred by newly injected masked embeddings, we design a decoupled autoencoder architecture.
arXiv Detail & Related papers (2023-03-01T08:33:16Z) - EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder [60.52613206271329]
This paper introduces textbfEfficient textbfPoint textbfCloud textbfLearning (EPCL) for training high-quality point cloud models with a frozen CLIP transformer.
Our EPCL connects the 2D and 3D modalities by semantically aligning the image features and point cloud features without paired 2D-3D data.
arXiv Detail & Related papers (2022-12-08T06:27:11Z) - Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud
Pre-training [56.81809311892475]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers.
We propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds.
arXiv Detail & Related papers (2022-05-28T11:22:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.