GeoMAE: Masked Geometric Target Prediction for Self-supervised Point
Cloud Pre-Training
- URL: http://arxiv.org/abs/2305.08808v1
- Date: Mon, 15 May 2023 17:14:55 GMT
- Title: GeoMAE: Masked Geometric Target Prediction for Self-supervised Point
Cloud Pre-Training
- Authors: Xiaoyu Tian, Haoxi Ran, Yue Wang, Hang Zhao
- Abstract summary: We introduce a point cloud representation learning framework, based on geometric feature reconstruction.
We identify three self-supervised learning objectives to peculiar point clouds, namely centroid prediction, normal estimation, and curvature prediction.
Our pipeline is conceptually simple and it consists of two major steps: first, it randomly masks out groups of points, followed by a Transformer-based point cloud encoder.
- Score: 16.825524577372473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tries to address a fundamental question in point cloud
self-supervised learning: what is a good signal we should leverage to learn
features from point clouds without annotations? To answer that, we introduce a
point cloud representation learning framework, based on geometric feature
reconstruction. In contrast to recent papers that directly adopt masked
autoencoder (MAE) and only predict original coordinates or occupancy from
masked point clouds, our method revisits differences between images and point
clouds and identifies three self-supervised learning objectives peculiar to
point clouds, namely centroid prediction, normal estimation, and curvature
prediction. Combined with occupancy prediction, these four objectives yield an
nontrivial self-supervised learning task and mutually facilitate models to
better reason fine-grained geometry of point clouds. Our pipeline is
conceptually simple and it consists of two major steps: first, it randomly
masks out groups of points, followed by a Transformer-based point cloud
encoder; second, a lightweight Transformer decoder predicts centroid, normal,
and curvature for points in each voxel. We transfer the pre-trained Transformer
encoder to a downstream peception model. On the nuScene Datset, our model
achieves 3.38 mAP improvment for object detection, 2.1 mIoU gain for
segmentation, and 1.7 AMOTA gain for multi-object tracking. We also conduct
experiments on the Waymo Open Dataset and achieve significant performance
improvements over baselines as well.
Related papers
- Point Cloud Pre-training with Diffusion Models [62.12279263217138]
We propose a novel pre-training method called Point cloud Diffusion pre-training (PointDif)
PointDif achieves substantial improvement across various real-world datasets for diverse downstream tasks such as classification, segmentation and detection.
arXiv Detail & Related papers (2023-11-25T08:10:05Z) - Clustering based Point Cloud Representation Learning for 3D Analysis [80.88995099442374]
We propose a clustering based supervised learning scheme for point cloud analysis.
Unlike current de-facto, scene-wise training paradigm, our algorithm conducts within-class clustering on the point embedding space.
Our algorithm shows notable improvements on famous point cloud segmentation datasets.
arXiv Detail & Related papers (2023-07-27T03:42:12Z) - PointPatchMix: Point Cloud Mixing with Patch Scoring [58.58535918705736]
We propose PointPatchMix, which mixes point clouds at the patch level and generates content-based targets for mixed point clouds.
Our approach preserves local features at the patch level, while the patch scoring module assigns targets based on the content-based significance score from a pre-trained teacher model.
With Point-MAE as our baseline, our model surpasses previous methods by a significant margin, achieving 86.3% accuracy on ScanObjectNN and 94.1% accuracy on ModelNet40.
arXiv Detail & Related papers (2023-03-12T14:49:42Z) - AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware
Transformers [94.11915008006483]
We present a new method that reformulates point cloud completion as a set-to-set translation problem.
We design a new model, called PoinTr, which adopts a Transformer encoder-decoder architecture for point cloud completion.
Our method attains 6.53 CD on PCN, 0.81 CD on ShapeNet-55 and 0.392 MMD on real-world KITTI.
arXiv Detail & Related papers (2023-01-11T16:14:12Z) - Upsampling Autoencoder for Self-Supervised Point Cloud Learning [11.19408173558718]
We propose a self-supervised pretraining model for point cloud learning without human annotations.
Upsampling operation encourages the network to capture both high-level semantic information and low-level geometric information of the point cloud.
We find that our UAE outperforms previous state-of-the-art methods in shape classification, part segmentation and point cloud upsampling tasks.
arXiv Detail & Related papers (2022-03-21T07:20:37Z) - Graph-Guided Deformation for Point Cloud Completion [35.10606375236494]
We propose a Graph-Guided Deformation Network, which respectively regards the input data and intermediate generation as controlling and supporting points.
Our key insight is to simulate the least square Laplacian deformation process via mesh deformation methods, which brings adaptivity for modeling variation in geometry details.
We are the first to refine the point cloud completion task by mimicing traditional graphics algorithms with GCN-guided deformation.
arXiv Detail & Related papers (2021-11-11T12:55:26Z) - Point Cloud Pre-training by Mixing and Disentangling [35.18101910728478]
Mixing and Disentangling (MD) is a self-supervised learning approach for point cloud pre-training.
We show that the encoder + ours (MD) significantly surpasses that of the encoder trained from scratch and converges quickly.
We hope this self-supervised learning attempt on point clouds can pave the way for reducing the deeply-learned model dependence on large-scale labeled data.
arXiv Detail & Related papers (2021-09-01T15:52:18Z) - PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers [81.71904691925428]
We present a new method that reformulates point cloud completion as a set-to-set translation problem.
We also design a new model, called PoinTr, that adopts a transformer encoder-decoder architecture for point cloud completion.
Our method outperforms state-of-the-art methods by a large margin on both the new benchmarks and the existing ones.
arXiv Detail & Related papers (2021-08-19T17:58:56Z) - Planning with Learned Dynamic Model for Unsupervised Point Cloud
Registration [25.096635750142227]
We develop a latent dynamic model of point clouds, consisting of a transformation network and evaluation network.
We employ the cross-entropy method (CEM) to iteratively update the planning policy by maximizing the rewards in the point cloud registration process.
Experimental results on ModelNet40 and 7Scene benchmark datasets demonstrate that our method can yield good registration performance in an unsupervised manner.
arXiv Detail & Related papers (2021-08-05T13:47:11Z) - Refinement of Predicted Missing Parts Enhance Point Cloud Completion [62.997667081978825]
Point cloud completion is the task of predicting complete geometry from partial observations using a point set representation for a 3D shape.
Previous approaches propose neural networks to directly estimate the whole point cloud through encoder-decoder models fed by the incomplete point set.
This paper proposes an end-to-end neural network architecture that focuses on computing the missing geometry and merging the known input and the predicted point cloud.
arXiv Detail & Related papers (2020-10-08T22:01:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.