Pre-training Point Cloud Compact Model with Partial-aware Reconstruction
- URL: http://arxiv.org/abs/2407.09344v1
- Date: Fri, 12 Jul 2024 15:18:14 GMT
- Title: Pre-training Point Cloud Compact Model with Partial-aware Reconstruction
- Authors: Yaohua Zha, Yanzi Wang, Tao Dai, Shu-Tao Xia,
- Abstract summary: We present a pre-trained Point cloud Compact Model with Partial-aware textbfReconstruction, named Point-CPR.
Our model exhibits strong performance across various tasks, especially surpassing the leading MPM-based model PointGPT-B with only 2% of its parameters.
- Score: 51.403810709250024
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The pre-trained point cloud model based on Masked Point Modeling (MPM) has exhibited substantial improvements across various tasks. However, two drawbacks hinder their practical application. Firstly, the positional embedding of masked patches in the decoder results in the leakage of their central coordinates, leading to limited 3D representations. Secondly, the excessive model size of existing MPM methods results in higher demands for devices. To address these, we propose to pre-train Point cloud Compact Model with Partial-aware \textbf{R}econstruction, named Point-CPR. Specifically, in the decoder, we couple the vanilla masked tokens with their positional embeddings as randomly masked queries and introduce a partial-aware prediction module before each decoder layer to predict them from the unmasked partial. It prevents the decoder from creating a shortcut between the central coordinates of masked patches and their reconstructed coordinates, enhancing the robustness of models. We also devise a compact encoder composed of local aggregation and MLPs, reducing the parameters and computational requirements compared to existing Transformer-based encoders. Extensive experiments demonstrate that our model exhibits strong performance across various tasks, especially surpassing the leading MPM-based model PointGPT-B with only 2% of its parameters.
Related papers
- Triple Point Masking [49.39218611030084]
Existing 3D mask learning methods encounter performance bottlenecks under limited data.
We introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders.
Extensive experiments show that the four baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks.
arXiv Detail & Related papers (2024-09-26T05:33:30Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling [47.94285833315427]
We propose a Locally constrained Compact point cloud Model (LCM) consisting of a locally constrained compact encoder and a locally constrained Mamba-based decoder.
Our encoder replaces self-attention with our local aggregation layers to achieve an elegant balance between performance and efficiency.
This decoder ensures linear complexity while maximizing the perception of point cloud geometry information from unmasked patches with higher information density.
arXiv Detail & Related papers (2024-05-27T13:19:23Z) - M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and
Siamese Decoders [19.68592678093725]
Masked point modeling has become a promising scheme of self-supervised pre-training for point clouds.
M$3$CS is proposed to enable the model with the above abilities.
arXiv Detail & Related papers (2023-09-23T02:19:21Z) - Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud
Pre-training [56.81809311892475]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers.
We propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds.
arXiv Detail & Related papers (2022-05-28T11:22:53Z) - Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point
Modeling [104.82953953453503]
We present Point-BERT, a new paradigm for learning Transformers to generalize the concept of BERT to 3D point cloud.
Experiments demonstrate that the proposed BERT-style pre-training strategy significantly improves the performance of standard point cloud Transformers.
arXiv Detail & Related papers (2021-11-29T18:59:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.