OVO-SLAM: Open-Vocabulary Online Simultaneous Localization and Mapping
- URL: http://arxiv.org/abs/2411.15043v1
- Date: Fri, 22 Nov 2024 16:25:05 GMT
- Title: OVO-SLAM: Open-Vocabulary Online Simultaneous Localization and Mapping
- Authors: Tomas Berriel Martins, Martin R. Oswald, Javier Civera,
- Abstract summary: This paper presents the first Open-Vocabulary Online 3D semantic SLAM pipeline, that we denote as OVO-SLAM.
We detect and track 3D segments, which we describe using CLIP vectors, calculated through a novel aggregation from the viewpoints where these 3D segments are observed.
- Score: 21.254743678057356
- License:
- Abstract: This paper presents the first Open-Vocabulary Online 3D semantic SLAM pipeline, that we denote as OVO-SLAM. Our primary contribution is in the pipeline itself, particularly in the mapping thread. Given a set of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors, calculated through a novel aggregation from the viewpoints where these 3D segments are observed. Notably, our OVO-SLAM pipeline is not only faster but also achieves better segmentation metrics compared to offline approaches in the literature. Along with superior segmentation performance, we show experimental results of our contributions integrated with Gaussian-SLAM, being the first ones demonstrating end-to-end open-vocabulary online 3D reconstructions without relying on ground-truth camera poses or scene geometry.
Related papers
- Split-and-Fit: Learning B-Reps via Structure-Aware Voronoi Partitioning [50.684254969269546]
We introduce a novel method for acquiring boundary representations (B-Reps) of 3D CAD models.
We apply a spatial partitioning to derive a single primitive within each partition.
We show that our network, coined NVD-Net for neural Voronoi diagrams, can effectively learn Voronoi partitions for CAD models from training data.
arXiv Detail & Related papers (2024-06-07T21:07:49Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting [51.96353586773191]
We introduce textbfGS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping system.
Our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering.
Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets.
arXiv Detail & Related papers (2023-11-20T12:08:23Z) - NEAT: Distilling 3D Wireframes from Neural Attraction Fields [52.90572335390092]
This paper studies the problem of structured lineframe junctions using 3D reconstruction segments andFocusing junctions.
ProjectNEAT enjoys the joint neural fields and view without crossart matching from scratch.
arXiv Detail & Related papers (2023-07-14T07:25:47Z) - Weakly Supervised 3D Open-vocabulary Segmentation [104.07740741126119]
We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner.
We distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF)
A notable aspect of our approach is that it does not require any manual segmentation annotations for either the foundation models or the distillation process.
arXiv Detail & Related papers (2023-05-23T14:16:49Z) - IDLS: Inverse Depth Line based Visual-Inertial SLAM [9.38589798999922]
Inverse Depth Line SLAM (IDLS) is proposed to track the line features in SLAM in an accurate and efficient way.
IDLS is extensively evaluated in multiple perceptually-challenging datasets.
arXiv Detail & Related papers (2023-04-23T20:53:05Z) - ESLAM: Efficient Dense SLAM System Based on Hybrid Representation of
Signed Distance Fields [2.0625936401496237]
ESLAM reads RGB-D frames with unknown camera poses in a sequential manner and incrementally reconstructs the scene representation.
ESLAM improves the accuracy of 3D reconstruction and camera localization of state-of-the-art dense visual SLAM methods by more than 50%.
arXiv Detail & Related papers (2022-11-21T18:25:14Z) - Detecting Line Segments in Motion-blurred Images with Events [38.39698414942873]
Existing line segment detection methods face severe performance degradation when detecting line segments when motion blur occurs.
We propose to leverage the complementary information of images and events to robustly detect line segments over motion blurs.
Our method achieves 63.3% mean structural average precision (msAP) with the model pre-trained on the FE-Wireframe and fine-tuned on the FE-Blurframe.
arXiv Detail & Related papers (2022-11-14T14:00:03Z) - On the descriptive power of LiDAR intensity images for segment-based
loop closing in 3-D SLAM [7.310043452300736]
We propose an extension to the segment-based global localization method for LiDAR SLAM using descriptors learned considering the visual context of the segments.
A new architecture of the deep neural network is presented that learns the visual context acquired from synthetic LiDAR intensity images.
arXiv Detail & Related papers (2021-08-03T09:44:23Z) - Line Flow based SLAM [36.10943109853581]
We propose a visual SLAM method by predicting and updating line flows that represent sequential 2D projections of 3D line segments.
The proposed method achieves state-of-the-art results due to the utilization of line flows.
arXiv Detail & Related papers (2020-09-21T15:55:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.