Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene
Representation
- URL: http://arxiv.org/abs/2310.03923v1
- Date: Thu, 5 Oct 2023 21:57:36 GMT
- Title: Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene
Representation
- Authors: Kashu Yamazaki, Taisei Hanyu, Khoa Vo, Thang Pham, Minh Tran,
Gianfranco Doretto, Anh Nguyen, Ngan Le
- Abstract summary: Open-Fusion is a groundbreaking approach for real-time open-vocabulary 3D mapping and queryable scene representation.
It harnesses the power of a pre-trained vision-language foundation model (VLFM) for open-set semantic comprehension.
It delivers outstanding annotation-free 3D segmentation for open-vocabulary without necessitating additional 3D training.
- Score: 13.770613689032503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Precise 3D environmental mapping is pivotal in robotics. Existing methods
often rely on predefined concepts during training or are time-intensive when
generating semantic maps. This paper presents Open-Fusion, a groundbreaking
approach for real-time open-vocabulary 3D mapping and queryable scene
representation using RGB-D data. Open-Fusion harnesses the power of a
pre-trained vision-language foundation model (VLFM) for open-set semantic
comprehension and employs the Truncated Signed Distance Function (TSDF) for
swift 3D scene reconstruction. By leveraging the VLFM, we extract region-based
embeddings and their associated confidence maps. These are then integrated with
3D knowledge from TSDF using an enhanced Hungarian-based feature-matching
mechanism. Notably, Open-Fusion delivers outstanding annotation-free 3D
segmentation for open-vocabulary without necessitating additional 3D training.
Benchmark tests on the ScanNet dataset against leading zero-shot methods
highlight Open-Fusion's superiority. Furthermore, it seamlessly combines the
strengths of region-based VLFM and TSDF, facilitating real-time 3D scene
comprehension that includes object concepts and open-world semantics. We
encourage the readers to view the demos on our project page:
https://uark-aicv.github.io/OpenFusion
Related papers
- OpenSU3D: Open World 3D Scene Understanding using Foundation Models [2.1262749936758216]
We present a novel, scalable approach for constructing open set, instance-level 3D scene representations.
Existing methods require pre-constructed 3D scenes and face scalability issues due to per-point feature vector learning.
We evaluate our proposed approach on multiple scenes from ScanNet and Replica datasets demonstrating zero-shot generalization capabilities.
arXiv Detail & Related papers (2024-07-19T13:01:12Z) - OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding [54.981605111365056]
This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding.
Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing.
arXiv Detail & Related papers (2024-06-04T07:42:33Z) - Open-Vocabulary SAM3D: Understand Any 3D Scene [32.00537984541871]
We introduce OV-SAM3D, a universal framework for open-vocabulary 3D scene understanding.
This framework is designed to perform understanding tasks for any 3D scene without requiring prior knowledge of the scene.
Empirical evaluations conducted on the ScanNet200 and nuScenes datasets demonstrate that our approach surpasses existing open-vocabulary methods in unknown open-world environments.
arXiv Detail & Related papers (2024-05-24T14:07:57Z) - OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views [90.71215823587875]
We propose OpenNeRF which naturally operates on posed images and directly encodes the VLM features within the NeRF.
Our work shows that using pixel-wise VLM features results in an overall less complex architecture without the need for additional DINO regularization.
For 3D point cloud segmentation on the Replica dataset, OpenNeRF outperforms recent open-vocabulary methods such as LERF and OpenScene by at least +4.9 mIoU.
arXiv Detail & Related papers (2024-04-04T17:59:08Z) - Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships [15.513180297629546]
We present Open3DSG, an alternative approach to learn 3D scene graph prediction in an open world without requiring labeled scene graph data.
We co-embed the features from a 3D scene graph prediction backbone with the feature space of powerful open world 2D vision language foundation models.
arXiv Detail & Related papers (2024-02-19T16:15:03Z) - ConceptFusion: Open-set Multimodal 3D Mapping [91.23054486724402]
ConceptFusion is a scene representation that is fundamentally open-set.
It enables reasoning beyond a closed set of concepts and inherently multimodal.
We evaluate ConceptFusion on a number of real-world datasets.
arXiv Detail & Related papers (2023-02-14T18:40:26Z) - Diffusion-SDF: Text-to-Shape via Voxelized Diffusion [90.85011923436593]
We propose a new generative 3D modeling framework called Diffusion-SDF for the challenging task of text-to-shape synthesis.
We show that Diffusion-SDF generates both higher quality and more diversified 3D shapes that conform well to given text descriptions.
arXiv Detail & Related papers (2022-12-06T19:46:47Z) - OpenScene: 3D Scene Understanding with Open Vocabularies [73.1411930820683]
Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision.
We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space.
This zero-shot approach enables task-agnostic training and open-vocabulary queries.
arXiv Detail & Related papers (2022-11-28T18:58:36Z) - DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF
Relocalization [56.15308829924527]
We propose a Siamese network that jointly learns 3D local feature detection and description directly from raw 3D points.
For detecting 3D keypoints we predict the discriminativeness of the local descriptors in an unsupervised manner.
Experiments on various benchmarks demonstrate that our method achieves competitive results for both global point cloud retrieval and local point cloud registration.
arXiv Detail & Related papers (2020-07-17T20:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.