SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Instance Segmentation
- URL: http://arxiv.org/abs/2311.17707v2
- Date: Tue, 04 Feb 2025 11:40:44 GMT
- Title: SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Instance Segmentation
- Authors: Mutian Xu, Xingyilang Yin, Lingteng Qiu, Yang Liu, Xin Tong, Xiaoguang Han,
- Abstract summary: We introduce SAMPro3D for zero-shot instance segmentation of 3D scenes.
Our approach segments 3D instances by applying the pretrained Segment Anything Model (SAM) to 2D frames.
Our method achieves comparable or better performance compared to previous zero-shot or fully supervised approaches.
- Score: 24.733049281032272
- License:
- Abstract: We introduce SAMPro3D for zero-shot instance segmentation of 3D scenes. Given the 3D point cloud and multiple posed RGB-D frames of 3D scenes, our approach segments 3D instances by applying the pretrained Segment Anything Model (SAM) to 2D frames. Our key idea involves locating SAM prompts in 3D to align their projected pixel prompts across frames, ensuring the view consistency of SAM-predicted masks. Moreover, we suggest selecting prompts from the initial set guided by the information of SAM-predicted masks across all views, which enhances the overall performance. We further propose to consolidate different prompts if they are segmenting different surface parts of the same 3D instance, bringing a more comprehensive segmentation. Notably, our method does not require any additional training. Extensive experiments on diverse benchmarks show that our method achieves comparable or better performance compared to previous zero-shot or fully supervised approaches, and in many cases surpasses human annotations. Furthermore, since our fine-grained predictions often lack annotations in available datasets, we present ScanNet200-Fine50 test data which provides fine-grained annotations on 50 scenes from ScanNet200 dataset. The project page can be accessed at https://mutianxu.github.io/sampro3d/.
Related papers
- SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners [87.76470518069338]
We introduce SAM2Point, a preliminary exploration adapting Segment Anything Model 2 (SAM 2) for promptable 3D segmentation.
Our framework supports various prompt types, including 3D points, boxes, and masks, and can generalize across diverse scenarios, such as 3D objects, indoor scenes, sparse outdoor environments, and raw LiDAR.
To our best knowledge, we present the most faithful implementation of SAM in 3D, which may serve as a starting point for future research in promptable 3D segmentation.
arXiv Detail & Related papers (2024-08-29T17:59:45Z) - EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration.
An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z) - Point-SAM: Promptable 3D Segmentation Model for Point Clouds [25.98791840584803]
We propose a 3D promptable segmentation model Point-SAM, focusing on point clouds.
We employ an efficient transformer-based architecture tailored for point clouds, extending SAM to the 3D domain.
We then distill the rich knowledge from 2D SAM for Point-SAM training by introducing a data engine to generate part-level and object-level pseudo-labels at scale from 2D SAM.
arXiv Detail & Related papers (2024-06-25T17:28:03Z) - Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation [91.40798599544136]
We propose a fast yet accurate open-vocabulary 3D instance segmentation approach, named Open-YOLO 3D.
It effectively leverages only 2D object detection from multi-view RGB images for open-vocabulary 3D instance segmentation.
We empirically find that a better performance of matching text prompts to 3D masks can be achieved in a faster fashion with a 2D object detector.
arXiv Detail & Related papers (2024-06-04T17:59:31Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - SAM3D: Segment Anything in 3D Scenes [33.57040455422537]
We propose a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning.
For a point cloud of a 3D scene with posed RGB images, we first predict segmentation masks of RGB images with SAM, and then project the 2D masks into the 3D points.
Our approach is experimented with ScanNet dataset and qualitative results demonstrate that our SAM3D achieves reasonable and fine-grained 3D segmentation results without any training or finetuning.
arXiv Detail & Related papers (2023-06-06T17:59:51Z) - Personalize Segment Anything Model with One Shot [52.54453744941516]
We propose a training-free Personalization approach for Segment Anything Model (SAM)
Given only a single image with a reference mask, PerSAM first localizes the target concept by a location prior.
PerSAM segments it within other images or videos via three techniques: target-guided attention, target-semantic prompting, and cascaded post-refinement.
arXiv Detail & Related papers (2023-05-04T17:59:36Z) - Segment Anything in 3D with Radiance Fields [83.14130158502493]
This paper generalizes the Segment Anything Model (SAM) to segment 3D objects.
We refer to the proposed solution as SA3D, short for Segment Anything in 3D.
We show in experiments that SA3D adapts to various scenes and achieves 3D segmentation within seconds.
arXiv Detail & Related papers (2023-04-24T17:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.