Better Call SAL: Towards Learning to Segment Anything in Lidar
- URL: http://arxiv.org/abs/2403.13129v2
- Date: Thu, 25 Jul 2024 15:32:39 GMT
- Title: Better Call SAL: Towards Learning to Segment Anything in Lidar
- Authors: Aljoša Ošep, Tim Meinhardt, Francesco Ferroni, Neehar Peri, Deva Ramanan, Laura Leal-Taixé,
- Abstract summary: We propose a text-promptable zero-shot model for segmenting and classifying any object in Lidar.
We utilize 2D vision foundation models to generate 3D supervision for free'' using pseudo-labels.
Our model achieves $91%$ in terms of class-agnostic and $54%$ in terms of zero-shot Lidar Panopticon.
- Score: 63.9984147657437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose the SAL (Segment Anything in Lidar) method consisting of a text-promptable zero-shot model for segmenting and classifying any object in Lidar, and a pseudo-labeling engine that facilitates model training without manual supervision. While the established paradigm for Lidar Panoptic Segmentation (LPS) relies on manual supervision for a handful of object classes defined a priori, we utilize 2D vision foundation models to generate 3D supervision ``for free''. Our pseudo-labels consist of instance masks and corresponding CLIP tokens, which we lift to Lidar using calibrated multi-modal data. By training our model on these labels, we distill the 2D foundation models into our Lidar SAL model. Even without manual labels, our model achieves $91\%$ in terms of class-agnostic segmentation and $54\%$ in terms of zero-shot Lidar Panoptic Segmentation of the fully supervised state-of-the-art. Furthermore, we outperform several baselines that do not distill but only lift image features to 3D. More importantly, we demonstrate that SAL supports arbitrary class prompts, can be easily extended to new datasets, and shows significant potential to improve with increasing amounts of self-labeled data. Code and models are available at this $\href{https://github.com/nv-dvl/segment-anything-lidar}{URL}$.
Related papers
- Point-SAM: Promptable 3D Segmentation Model for Point Clouds [25.98791840584803]
We propose a 3D promptable segmentation model (Point-SAM) focusing on point clouds.
Our approach utilizes a transformer-based method, extending SAM to the 3D domain.
Our model outperforms state-of-the-art models on several indoor and outdoor benchmarks.
arXiv Detail & Related papers (2024-06-25T17:28:03Z) - Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels [69.55622471172941]
Large-scale vision 2D vision language models, such as CLIP can be aligned with a 3D encoder to learn generalizable (open-vocabulary) 3D vision models.
We propose an optimization framework: Cross-MoST: Cross-Modal Self-Training, to improve the label-free classification performance of a zero-shot 3D vision model.
arXiv Detail & Related papers (2024-04-15T21:30:50Z) - PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models [51.24979014650188]
We present PointSeg, a training-free paradigm that leverages off-the-shelf vision foundation models to address 3D scene perception tasks.
PointSeg can segment anything in 3D scene by acquiring accurate 3D prompts to align their corresponding pixels across frames.
Our approach significantly surpasses the state-of-the-art specialist training-free model by 14.1$%$, 12.3$%$, and 12.6$%$ mAP on ScanNet, ScanNet++, and KITTI-360 datasets.
arXiv Detail & Related papers (2024-03-11T03:28:20Z) - Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without
Manual Labels [141.23836433191624]
Current 3D scene segmentation methods are heavily dependent on manually annotated 3D training datasets.
We propose Segment3D, a method for class-agnostic 3D scene segmentation that produces high-quality 3D segmentation masks.
arXiv Detail & Related papers (2023-12-28T18:57:11Z) - Beyond the Label Itself: Latent Labels Enhance Semi-supervised Point
Cloud Panoptic Segmentation [46.01433705072047]
We find two types of latent labels behind the displayed label embedded in LiDAR and image data.
We propose a novel augmentation, Cylinder-Mix, which is able to augment more yet reliable samples for training.
We also propose the Instance Position-scale Learning (IPSL) Module to learn and fuse the information of instance position and scale.
arXiv Detail & Related papers (2023-12-13T15:56:24Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Semantic-SAM: Segment and Recognize Anything at Any Granularity [83.64686655044765]
We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts.
For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
arXiv Detail & Related papers (2023-07-10T17:59:40Z) - LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for
Autonomous Driving [34.119642131912485]
We present a more artful framework, LiDAR-guided Weakly Supervised Instance (LWSIS)
LWSIS uses the off-the-shelf 3D data, i.e., Point Cloud, together with the 3D boxes, as natural weak supervisions for training the 2D image instance segmentation models.
Our LWSIS not only exploits the complementary information in multimodal data during training, but also significantly reduces the cost of the dense 2D masks.
arXiv Detail & Related papers (2022-12-07T08:08:01Z) - SuperLine3D: Self-supervised Line Segmentation and Description for LiDAR
Point Cloud [35.16632339908634]
We propose the first learning-based feature segmentation and description model for 3D lines in LiDAR point cloud.
Our model can extract lines under arbitrary scale perturbations, and we use shared EdgeConv encoder layers to train the two segmentation and descriptor heads jointly.
Experiments have demonstrated that our line-based registration method is highly competitive to state-of-the-art point-based approaches.
arXiv Detail & Related papers (2022-08-03T09:06:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.