Related papers: Volume-DROID: A Real-Time Implementation of Volumetric Mapping with DROID-SLAM

Volume-DROID: A Real-Time Implementation of Volumetric Mapping with DROID-SLAM

URL: http://arxiv.org/abs/2306.06850v1
Date: Mon, 12 Jun 2023 03:50:50 GMT
Title: Volume-DROID: A Real-Time Implementation of Volumetric Mapping with DROID-SLAM
Authors: Peter Stratton, Sandilya Sai Garimella, Ashwin Saxena, Nibarkavi Amutha, Emaad Gerami
Abstract summary: Volume-DROID is a novel approach for Simultaneous localization and Mapping (SLAM) It combines DROID-SLAM, point cloud registration, an off-the-shelf semantic segmentation network, and Convolutional Bayesian Kernel Inference (ConvBKI) The key innovation of our method is the real-time fusion of DROID-SLAM and Convolutional Bayesian Kernel Inference (ConvBKI)
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents Volume-DROID, a novel approach for Simultaneous Localization and Mapping (SLAM) that integrates Volumetric Mapping and Differentiable Recurrent Optimization-Inspired Design (DROID). Volume-DROID takes camera images (monocular or stereo) or frames from a video as input and combines DROID-SLAM, point cloud registration, an off-the-shelf semantic segmentation network, and Convolutional Bayesian Kernel Inference (ConvBKI) to generate a 3D semantic map of the environment and provide accurate localization for the robot. The key innovation of our method is the real-time fusion of DROID-SLAM and Convolutional Bayesian Kernel Inference (ConvBKI), achieved through the introduction of point cloud generation from RGB-Depth frames and optimized camera poses. This integration, engineered to enable efficient and timely processing, minimizes lag and ensures effective performance of the system. Our approach facilitates functional real-time online semantic mapping with just camera images or stereo video input. Our paper offers an open-source Python implementation of the algorithm, available at https://github.com/peterstratton/Volume-DROID.

Related papers

SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos [33.57444419305241]
SLAM3R is a novel system for real-time, high-quality, dense 3D reconstruction using RGB videos. It seamlessly integrates local 3D reconstruction and global coordinate registration through feed-forward neural networks. It achieves state-of-the-art reconstruction accuracy and completeness while maintaining real-time performance at 20+ FPS.
arXiv Detail & Related papers (2024-12-12T16:08:03Z)
Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching [2.400446821380503]
We introduce an efficient framework to learn descriptors for both RGB images and point clouds. It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy. A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and RGB images for multi-view supervision.
arXiv Detail & Related papers (2024-10-08T18:31:41Z)
OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts. Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module. Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z)
Multiway Point Cloud Mosaicking with Diffusion and Global Optimization [74.3802812773891]
We introduce a novel framework for multiway point cloud mosaicking (named Wednesday) At the core of our approach is ODIN, a learned pairwise registration algorithm that identifies overlaps and refines attention scores. Tested on four diverse, large-scale datasets, our method state-of-the-art pairwise and rotation registration results by a large margin on all benchmarks.
arXiv Detail & Related papers (2024-03-30T17:29:13Z)
Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting [24.160436463991495]
We present a dense simultaneous localization and mapping (SLAM) method that uses 3D Gaussians as a scene representation. Our approach enables interactive-time reconstruction and photo-realistic rendering from real-world single-camera RGBD videos.
arXiv Detail & Related papers (2023-12-06T10:47:53Z)
Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner. Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping. Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z)
CROSSFIRE: Camera Relocalization On Self-Supervised Features from an Implicit Representation [3.565151496245487]
We use Neural Radiance Fields as an implicit map of a given scene and propose a camera relocalization tailored for this representation. The proposed method enables to compute in real-time the precise position of a device using a single RGB camera, during its navigation.
arXiv Detail & Related papers (2023-03-08T20:22:08Z)
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets [95.84755169585492]
We present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception. Our model achieves state-of-the-art performance with a broad range of 3D perception tasks.
arXiv Detail & Related papers (2023-01-15T09:31:58Z)
Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range. We focus on event-based visual odometry (VO) We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z)
ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions. We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space. Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z)
Real-time Dense Reconstruction of Tissue Surface from Stereo Optical Video [10.181846237133167]
We propose an approach to reconstruct dense three-dimensional (3D) model of tissue surface from stereo optical videos in real-time. The basic idea is to first extract 3D information from video frames by using stereo matching, and then to mosaic the reconstructed 3D models. Experimental results on ex- and in vivo data showed that the reconstructed 3D models have high resolution texture with an accuracy error of less than 2 mm.
arXiv Detail & Related papers (2020-07-16T19:14:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.