PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds
- URL: http://arxiv.org/abs/2503.13914v1
- Date: Tue, 18 Mar 2025 05:17:06 GMT
- Title: PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds
- Authors: Barza Nisar, Steven L. Waslander,
- Abstract summary: We propose PSA-SSL, a novel extension to point cloud SSL that learns object pose and size-aware features.<n>Our approach outperforms other state-of-the-art SSL methods on 3D semantic segmentation and 3D object detection.
- Score: 8.645078288584305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning (SSL) on 3D point clouds has the potential to learn feature representations that can transfer to diverse sensors and multiple downstream perception tasks. However, recent SSL approaches fail to define pretext tasks that retain geometric information such as object pose and scale, which can be detrimental to the performance of downstream localization and geometry-sensitive 3D scene understanding tasks, such as 3D semantic segmentation and 3D object detection. We propose PSA-SSL, a novel extension to point cloud SSL that learns object pose and size-aware (PSA) features. Our approach defines a self-supervised bounding box regression pretext task, which retains object pose and size information. Furthermore, we incorporate LiDAR beam pattern augmentation on input point clouds, which encourages learning sensor-agnostic features. Our experiments demonstrate that with a single pretrained model, our light-weight yet effective extensions achieve significant improvements on 3D semantic segmentation with limited labels across popular autonomous driving datasets (Waymo, nuScenes, SemanticKITTI). Moreover, our approach outperforms other state-of-the-art SSL methods on 3D semantic segmentation (using up to 10 times less labels), as well as on 3D object detection. Our code will be released on https://github.com/TRAILab/PSA-SSL.
Related papers
- Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds [9.994719163112416]
Masked autoencoders (MAE) have shown tremendous potential for self-supervised learning (SSL) in vision and beyond.<n>Point clouds from LiDARs used in automated driving are particularly challenging for MAEs since large areas of the 3D volume are empty.<n>We propose the novel neighborhood occupancy MAE (NOMAE) that overcomes the aforementioned challenges by employing masked occupancy reconstruction only in the neighborhood of non-masked voxels.
arXiv Detail & Related papers (2025-02-27T17:42:47Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds [13.426810473131642]
Masked AutoEncoder for LiDAR point clouds (MAELi) intuitively leverages the sparsity of LiDAR point clouds in both the encoder and decoder during reconstruction.
In a novel reconstruction approach, MAELi distinguishes between empty and occluded space.
Thereby, without any ground truth whatsoever and trained on single frames only, MAELi obtains an understanding of the underlying 3D scene geometry and semantics.
arXiv Detail & Related papers (2022-12-14T13:10:27Z) - UpCycling: Semi-supervised 3D Object Detection without Sharing Raw-level
Unlabeled Scenes [7.32610370107512]
UpCycling is a novel SSL framework for 3D object detection with zero additional raw-level point cloud.
We introduce hybrid pseudo labels, feature-level Ground Truth sampling (F-GT) and Rotation (F-RoT)
UpCycling significantly outperforms the state-of-the-art SSL methods that utilize raw-point scenes.
arXiv Detail & Related papers (2022-11-22T02:04:09Z) - SL3D: Self-supervised-Self-labeled 3D Recognition [89.19932178712065]
We propose a Self-supervised-Self-Labeled 3D Recognition (SL3D) framework.
SL3D simultaneously solves two coupled objectives, i.e., clustering and learning feature representation.
It can be applied to solve different 3D recognition tasks, including classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2022-10-30T11:08:25Z) - Prompt-guided Scene Generation for 3D Zero-Shot Learning [8.658191774247944]
We propose a prompt-guided 3D scene generation and supervision method to augment 3D data to learn the network better.
First, we merge point clouds of two 3D models in certain ways described by a prompt. The prompt acts like the annotation describing each 3D scene.
We have achieved state-of-the-art ZSL and generalized ZSL performance on synthetic (ModelNet40, ModelNet10) and real-scanned (ScanOjbectNN) 3D object datasets.
arXiv Detail & Related papers (2022-09-29T11:24:33Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - Zero-Shot Learning on 3D Point Cloud Objects and Beyond [21.6491982908705]
We identify some of the challenges and apply 2D Zero-Shot Learning (ZSL) methods in the 3D domain to analyze the performance of existing models.
A novel loss function is developed that simultaneously aligns seen semantics with point cloud features.
An extensive set of experiments is carried out, establishing state-of-the-art for ZSL and GZSL on synthetic and real datasets.
arXiv Detail & Related papers (2021-04-11T10:04:06Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.