SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model
- URL: http://arxiv.org/abs/2306.02245v2
- Date: Mon, 29 Jan 2024 12:14:04 GMT
- Title: SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model
- Authors: Dingyuan Zhang, Dingkang Liang, Hongcheng Yang, Zhikang Zou, Xiaoqing
Ye, Zhe Liu, Xiang Bai
- Abstract summary: This paper explores adapting the zero-shot ability of SAM to 3D object detection in this paper.
We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale open dataset.
- Score: 59.04877271899894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the development of large language models, many remarkable linguistic
systems like ChatGPT have thrived and achieved astonishing success on many
tasks, showing the incredible power of foundation models. In the spirit of
unleashing the capability of foundation models on vision tasks, the Segment
Anything Model (SAM), a vision foundation model for image segmentation, has
been proposed recently and presents strong zero-shot ability on many downstream
2D tasks. However, whether SAM can be adapted to 3D vision tasks has yet to be
explored, especially 3D object detection. With this inspiration, we explore
adapting the zero-shot ability of SAM to 3D object detection in this paper. We
propose a SAM-powered BEV processing pipeline to detect objects and get
promising results on the large-scale Waymo open dataset. As an early attempt,
our method takes a step toward 3D object detection with vision foundation
models and presents the opportunity to unleash their power on 3D vision tasks.
The code is released at https://github.com/DYZhang09/SAM3D.
Related papers
- Point-SAM: Promptable 3D Segmentation Model for Point Clouds [25.98791840584803]
We propose a 3D promptable segmentation model (Point-SAM) focusing on point clouds.
Our approach utilizes a transformer-based method, extending SAM to the 3D domain.
Our model outperforms state-of-the-art models on several indoor and outdoor benchmarks.
arXiv Detail & Related papers (2024-06-25T17:28:03Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - Probing the 3D Awareness of Visual Foundation Models [56.68380136809413]
We analyze the 3D awareness of visual foundation models.
We conduct experiments using task-specific probes and zero-shot inference procedures on frozen features.
arXiv Detail & Related papers (2024-04-12T17:58:04Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild [32.05421669957098]
Large datasets and scalable solutions have led to unprecedented advances in 2D recognition.
We revisit the task of 3D object detection by introducing a large benchmark, called Omni3D.
We show that Cube R-CNN outperforms prior works on the larger Omni3D and existing benchmarks.
arXiv Detail & Related papers (2022-07-21T17:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.