OVO-SLAM: Open-Vocabulary Online Simultaneous Localization and Mapping
- URL: http://arxiv.org/abs/2411.15043v1
- Date: Fri, 22 Nov 2024 16:25:05 GMT
- Title: OVO-SLAM: Open-Vocabulary Online Simultaneous Localization and Mapping
- Authors: Tomas Berriel Martins, Martin R. Oswald, Javier Civera,
- Abstract summary: This paper presents the first Open-Vocabulary Online 3D semantic SLAM pipeline, that we denote as OVO-SLAM.
We detect and track 3D segments, which we describe using CLIP vectors, calculated through a novel aggregation from the viewpoints where these 3D segments are observed.
- Score: 21.254743678057356
- License:
- Abstract: This paper presents the first Open-Vocabulary Online 3D semantic SLAM pipeline, that we denote as OVO-SLAM. Our primary contribution is in the pipeline itself, particularly in the mapping thread. Given a set of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors, calculated through a novel aggregation from the viewpoints where these 3D segments are observed. Notably, our OVO-SLAM pipeline is not only faster but also achieves better segmentation metrics compared to offline approaches in the literature. Along with superior segmentation performance, we show experimental results of our contributions integrated with Gaussian-SLAM, being the first ones demonstrating end-to-end open-vocabulary online 3D reconstructions without relying on ground-truth camera poses or scene geometry.
Related papers
- LinPrim: Linear Primitives for Differentiable Volumetric Rendering [53.780682194322225]
We introduce two new scene representations based on linear primitives-octahedra and tetrahedra-both of which define homogeneous volumes bounded by triangular faces.
This formulation aligns naturally with standard mesh-based tools, minimizing overhead for downstream applications.
We demonstrate comparable performance to state-of-the-art volumetric methods while requiring fewer primitives to achieve similar reconstruction fidelity.
arXiv Detail & Related papers (2025-01-27T18:49:38Z) - PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM [105.01907579424362]
PanoSLAM is the first SLAM system to integrate geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation within a unified framework.
For the first time, it achieves panoptic 3D reconstruction of open-world environments directly from the RGB-D video.
arXiv Detail & Related papers (2024-12-31T08:58:10Z) - GLS: Geometry-aware 3D Language Gaussian Splatting [16.13929985676661]
This paper presents a unified framework of surface reconstruction and open-vocabulary segmentation based on 3DGS.
For indoor surface reconstruction, we introduce surface normal prior as a geometric cue to guide the rendered normal, and use the normal error to optimize the rendered depth.
For open-vocabulary segmentation, we employ 2D CLIP features to guide instance features and utilize DEVA masks to enhance their view consistency.
arXiv Detail & Related papers (2024-11-27T05:21:34Z) - HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction [38.47566815670662]
HI-SLAM2 is a geometry-aware Gaussian SLAM system that achieves fast and accurate monocular scene reconstruction using only RGB input.
We demonstrate significant improvements over existing Neural SLAM methods and even surpass RGB-D-based methods in both reconstruction and rendering quality.
arXiv Detail & Related papers (2024-11-27T01:39:21Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - NEAT: Distilling 3D Wireframes from Neural Attraction Fields [52.90572335390092]
This paper studies the problem of structured lineframe junctions using 3D reconstruction segments andFocusing junctions.
ProjectNEAT enjoys the joint neural fields and view without crossart matching from scratch.
arXiv Detail & Related papers (2023-07-14T07:25:47Z) - Weakly Supervised 3D Open-vocabulary Segmentation [104.07740741126119]
We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner.
We distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF)
A notable aspect of our approach is that it does not require any manual segmentation annotations for either the foundation models or the distillation process.
arXiv Detail & Related papers (2023-05-23T14:16:49Z) - On the descriptive power of LiDAR intensity images for segment-based
loop closing in 3-D SLAM [7.310043452300736]
We propose an extension to the segment-based global localization method for LiDAR SLAM using descriptors learned considering the visual context of the segments.
A new architecture of the deep neural network is presented that learns the visual context acquired from synthetic LiDAR intensity images.
arXiv Detail & Related papers (2021-08-03T09:44:23Z) - Line Flow based SLAM [36.10943109853581]
We propose a visual SLAM method by predicting and updating line flows that represent sequential 2D projections of 3D line segments.
The proposed method achieves state-of-the-art results due to the utilization of line flows.
arXiv Detail & Related papers (2020-09-21T15:55:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.