MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds
- URL: http://arxiv.org/abs/2212.07207v5
- Date: Thu, 7 Dec 2023 16:36:15 GMT
- Title: MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds
- Authors: Georg Krispel, David Schinagl, Christian Fruhwirth-Reisinger, Horst
Possegger, Horst Bischof
- Abstract summary: Masked AutoEncoder for LiDAR point clouds (MAELi) intuitively leverages the sparsity of LiDAR point clouds in both the encoder and decoder during reconstruction.
In a novel reconstruction approach, MAELi distinguishes between empty and occluded space.
Thereby, without any ground truth whatsoever and trained on single frames only, MAELi obtains an understanding of the underlying 3D scene geometry and semantics.
- Score: 13.426810473131642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The sensing process of large-scale LiDAR point clouds inevitably causes large
blind spots, i.e. regions not visible to the sensor. We demonstrate how these
inherent sampling properties can be effectively utilized for self-supervised
representation learning by designing a highly effective pre-training framework
that considerably reduces the need for tedious 3D annotations to train
state-of-the-art object detectors. Our Masked AutoEncoder for LiDAR point
clouds (MAELi) intuitively leverages the sparsity of LiDAR point clouds in both
the encoder and decoder during reconstruction. This results in more expressive
and useful initialization, which can be directly applied to downstream
perception tasks, such as 3D object detection or semantic segmentation for
autonomous driving. In a novel reconstruction approach, MAELi distinguishes
between empty and occluded space and employs a new masking strategy that
targets the LiDAR's inherent spherical projection. Thereby, without any ground
truth whatsoever and trained on single frames only, MAELi obtains an
understanding of the underlying 3D scene geometry and semantics. To demonstrate
the potential of MAELi, we pre-train backbones in an end-to-end manner and show
the effectiveness of our unsupervised pre-trained weights on the tasks of 3D
object detection and semantic segmentation.
Related papers
- OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Semantics-aware LiDAR-Only Pseudo Point Cloud Generation for 3D Object
Detection [0.7234862895932991]
Recent advances introduced pseudo-LiDAR, i.e., synthetic dense point clouds, using additional modalities such as cameras to enhance 3D object detection.
We present a novel LiDAR-only framework that augments raw scans with dense pseudo point clouds by relying on LiDAR sensors and scene semantics.
arXiv Detail & Related papers (2023-09-16T09:18:47Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
Cloud Action Recognition [160.49403075559158]
We propose a Masked Pseudo-Labeling autoEncoder (textbfMAPLE) framework for point cloud action recognition.
In particular, we design a novel and efficient textbfDecoupled textbfspatial-textbftemporal TranstextbfFormer (textbfDestFormer) as the backbone of MAPLE.
MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08% accuracy on the MSR-Action3
arXiv Detail & Related papers (2022-09-01T12:32:40Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point
Clouds with Masked Occupancy Autoencoders [13.119676419877244]
We propose a solution to reduce the dependence on labelled 3D training data by leveraging pre-training on large-scale unlabeled outdoor LiDAR point clouds.
Our approach introduces a new self-supervised masked occupancy pre-training method called Occupancy-MAE.
For 3D object detection, Occupancy-MAE reduces the labelled data required for car detection on the KITTI dataset by half.
For 3D semantic segmentation, Occupancy-MAE outperforms training from scratch by around 2% in mIoU.
arXiv Detail & Related papers (2022-06-20T17:15:50Z) - Self-Supervised Point Cloud Representation Learning with Occlusion
Auto-Encoder [63.77257588569852]
We present 3D Occlusion Auto-Encoder (3D-OAE) for learning representations for point clouds.
Our key idea is to randomly occlude some local patches of the input point cloud and establish the supervision via recovering the occluded patches.
In contrast with previous methods, our 3D-OAE can remove a large proportion of patches and predict them only with a small number of visible patches.
arXiv Detail & Related papers (2022-03-26T14:06:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.