SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
- URL: http://arxiv.org/abs/2408.16768v1
- Date: Thu, 29 Aug 2024 17:59:45 GMT
- Title: SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
- Authors: Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Chengzhuo Tong, Peng Gao, Chunyuan Li, Pheng-Ann Heng,
- Abstract summary: We introduce SAM2Point, a preliminary exploration adapting Segment Anything Model 2 (SAM 2) for promptable 3D segmentation.
Our framework supports various prompt types, including 3D points, boxes, and masks, and can generalize across diverse scenarios, such as 3D objects, indoor scenes, sparse outdoor environments, and raw LiDAR.
To our best knowledge, we present the most faithful implementation of SAM in 3D, which may serve as a starting point for future research in promptable 3D segmentation.
- Score: 87.76470518069338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce SAM2Point, a preliminary exploration adapting Segment Anything Model 2 (SAM 2) for zero-shot and promptable 3D segmentation. SAM2Point interprets any 3D data as a series of multi-directional videos, and leverages SAM 2 for 3D-space segmentation, without further training or 2D-3D projection. Our framework supports various prompt types, including 3D points, boxes, and masks, and can generalize across diverse scenarios, such as 3D objects, indoor scenes, outdoor environments, and raw sparse LiDAR. Demonstrations on multiple 3D datasets, e.g., Objaverse, S3DIS, ScanNet, Semantic3D, and KITTI, highlight the robust generalization capabilities of SAM2Point. To our best knowledge, we present the most faithful implementation of SAM in 3D, which may serve as a starting point for future research in promptable 3D segmentation. Online Demo: https://huggingface.co/spaces/ZiyuG/SAM2Point . Code: https://github.com/ZiyuGuo99/SAM2Point .
Related papers
- EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration.
An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z) - Point-SAM: Promptable 3D Segmentation Model for Point Clouds [25.98791840584803]
We propose a 3D promptable segmentation model (Point-SAM) focusing on point clouds.
Our approach utilizes a transformer-based method, extending SAM to the 3D domain.
Our model outperforms state-of-the-art models on several indoor and outdoor benchmarks.
arXiv Detail & Related papers (2024-06-25T17:28:03Z) - SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation [26.207530327673748]
We introduce SAMPro3D for zero-shot 3D indoor scene segmentation.
Our approach segments 3D scenes by applying the pretrained Segment Anything Model (SAM) to 2D frames.
Our method consistently achieves higher quality and more diverse segmentation than previous zero-shot or fully supervised approaches.
arXiv Detail & Related papers (2023-11-29T15:11:03Z) - 3D-LLM: Injecting the 3D World into Large Language Models [60.43823088804661]
Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning.
We propose to inject the 3D world into large language models and introduce a new family of 3D-LLMs.
Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks.
arXiv Detail & Related papers (2023-07-24T17:59:02Z) - TomoSAM: a 3D Slicer extension using SAM for tomography segmentation [62.997667081978825]
TomoSAM has been developed to integrate the cutting-edge Segment Anything Model (SAM) into 3D Slicer.
SAM is a promptable deep learning model that is able to identify objects and create image masks in a zero-shot manner.
The synergy between these tools aids in the segmentation of complex 3D datasets from tomography or other imaging techniques.
arXiv Detail & Related papers (2023-06-14T16:13:27Z) - SAM3D: Segment Anything in 3D Scenes [33.57040455422537]
We propose a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning.
For a point cloud of a 3D scene with posed RGB images, we first predict segmentation masks of RGB images with SAM, and then project the 2D masks into the 3D points.
Our approach is experimented with ScanNet dataset and qualitative results demonstrate that our SAM3D achieves reasonable and fine-grained 3D segmentation results without any training or finetuning.
arXiv Detail & Related papers (2023-06-06T17:59:51Z) - SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model [59.04877271899894]
This paper explores adapting the zero-shot ability of SAM to 3D object detection in this paper.
We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale open dataset.
arXiv Detail & Related papers (2023-06-04T03:09:21Z) - Segment Anything in 3D with Radiance Fields [83.14130158502493]
This paper generalizes the Segment Anything Model (SAM) to segment 3D objects.
We refer to the proposed solution as SA3D, short for Segment Anything in 3D.
We show in experiments that SA3D adapts to various scenes and achieves 3D segmentation within seconds.
arXiv Detail & Related papers (2023-04-24T17:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.