PRED: Pre-training via Semantic Rendering on LiDAR Point Clouds
- URL: http://arxiv.org/abs/2311.04501v1
- Date: Wed, 8 Nov 2023 07:26:09 GMT
- Title: PRED: Pre-training via Semantic Rendering on LiDAR Point Clouds
- Authors: Hao Yang, Haiyang Wang, Di Dai, Liwei Wang
- Abstract summary: We propose PRED, a novel image-assisted pre-training framework for outdoor point clouds.
The main ingredient of our framework is a Birds-Eye-View (BEV) feature map conditioned semantic rendering.
We further enhance our model's performance by incorporating point-wise masking with a high mask ratio.
- Score: 18.840000859663153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training is crucial in 3D-related fields such as autonomous driving where
point cloud annotation is costly and challenging. Many recent studies on point
cloud pre-training, however, have overlooked the issue of incompleteness, where
only a fraction of the points are captured by LiDAR, leading to ambiguity
during the training phase. On the other hand, images offer more comprehensive
information and richer semantics that can bolster point cloud encoders in
addressing the incompleteness issue inherent in point clouds. Yet,
incorporating images into point cloud pre-training presents its own challenges
due to occlusions, potentially causing misalignments between points and pixels.
In this work, we propose PRED, a novel image-assisted pre-training framework
for outdoor point clouds in an occlusion-aware manner. The main ingredient of
our framework is a Birds-Eye-View (BEV) feature map conditioned semantic
rendering, leveraging the semantics of images for supervision through neural
rendering. We further enhance our model's performance by incorporating
point-wise masking with a high mask ratio (95%). Extensive experiments
demonstrate PRED's superiority over prior point cloud pre-training methods,
providing significant improvements on various large-scale datasets for 3D
perception tasks. Codes will be available at https://github.com/PRED4pc/PRED.
Related papers
- Adapt PointFormer: 3D Point Cloud Analysis via Adapting 2D Visual Transformers [38.08724410736292]
This paper attempts to leverage pre-trained models with 2D prior knowledge to accomplish the tasks for 3D point cloud analysis.
We propose the Adaptive PointFormer (APF), which fine-tunes pre-trained 2D models with only a modest number of parameters to directly process point clouds.
arXiv Detail & Related papers (2024-07-18T06:32:45Z) - ESP-Zero: Unsupervised enhancement of zero-shot classification for Extremely Sparse Point cloud [7.066196862701362]
We propose an unsupervised model adaptation approach to enhance the point cloud encoder for the extremely sparse point clouds.
We propose a novel fused-cross attention layer that expands the pre-trained self-attention layer with additional learnable tokens and attention blocks.
We also propose a complementary learning-based self-distillation schema that encourages the modified features to be pulled apart from the irrelevant text embeddings.
arXiv Detail & Related papers (2024-04-30T15:42:45Z) - HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation [106.09886920774002]
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network.
Our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes.
arXiv Detail & Related papers (2024-03-18T14:18:08Z) - Ponder: Point Cloud Pre-training via Neural Rendering [93.34522605321514]
We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural encoders.
The learned point-cloud can be easily integrated into various downstream tasks, including not only high-level rendering tasks like 3D detection and segmentation, but low-level tasks like 3D reconstruction and image rendering.
arXiv Detail & Related papers (2022-12-31T08:58:39Z) - EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder [60.52613206271329]
This paper introduces textbfEfficient textbfPoint textbfCloud textbfLearning (EPCL) for training high-quality point cloud models with a frozen CLIP transformer.
Our EPCL connects the 2D and 3D modalities by semantically aligning the image features and point cloud features without paired 2D-3D data.
arXiv Detail & Related papers (2022-12-08T06:27:11Z) - Leveraging Single-View Images for Unsupervised 3D Point Cloud Completion [53.93172686610741]
Cross-PCC is an unsupervised point cloud completion method without requiring any 3D complete point clouds.
To take advantage of the complementary information from 2D images, we use a single-view RGB image to extract 2D features.
Our method even achieves comparable performance to some supervised methods.
arXiv Detail & Related papers (2022-12-01T15:11:21Z) - P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with
Point-to-Pixel Prompting [94.11915008006483]
We propose a novel Point-to-Pixel prompting for point cloud analysis.
Our method attains 89.3% accuracy on the hardest setting of ScanObjectNN.
Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Code.
arXiv Detail & Related papers (2022-08-04T17:59:03Z) - PointAttN: You Only Need Attention for Point Cloud Completion [89.88766317412052]
Point cloud completion refers to completing 3D shapes from partial 3D point clouds.
We propose a novel neural network for processing point cloud in a per-point manner to eliminate kNNs.
The proposed framework, namely PointAttN, is simple, neat and effective, which can precisely capture the structural information of 3D shapes.
arXiv Detail & Related papers (2022-03-16T09:20:01Z) - Point Cloud Pre-training by Mixing and Disentangling [35.18101910728478]
Mixing and Disentangling (MD) is a self-supervised learning approach for point cloud pre-training.
We show that the encoder + ours (MD) significantly surpasses that of the encoder trained from scratch and converges quickly.
We hope this self-supervised learning attempt on point clouds can pave the way for reducing the deeply-learned model dependence on large-scale labeled data.
arXiv Detail & Related papers (2021-09-01T15:52:18Z) - SSPU-Net: Self-Supervised Point Cloud Upsampling via Differentiable
Rendering [21.563862632172363]
We propose a self-supervised point cloud upsampling network (SSPU-Net) to generate dense point clouds without using ground truth.
To achieve this, we exploit the consistency between the input sparse point cloud and generated dense point cloud for the shapes and rendered images.
arXiv Detail & Related papers (2021-08-01T13:26:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.