Volume-DROID: A Real-Time Implementation of Volumetric Mapping with
DROID-SLAM
- URL: http://arxiv.org/abs/2306.06850v1
- Date: Mon, 12 Jun 2023 03:50:50 GMT
- Title: Volume-DROID: A Real-Time Implementation of Volumetric Mapping with
DROID-SLAM
- Authors: Peter Stratton, Sandilya Sai Garimella, Ashwin Saxena, Nibarkavi
Amutha, Emaad Gerami
- Abstract summary: Volume-DROID is a novel approach for Simultaneous localization and Mapping (SLAM)
It combines DROID-SLAM, point cloud registration, an off-the-shelf semantic segmentation network, and Convolutional Bayesian Kernel Inference (ConvBKI)
The key innovation of our method is the real-time fusion of DROID-SLAM and Convolutional Bayesian Kernel Inference (ConvBKI)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents Volume-DROID, a novel approach for Simultaneous
Localization and Mapping (SLAM) that integrates Volumetric Mapping and
Differentiable Recurrent Optimization-Inspired Design (DROID). Volume-DROID
takes camera images (monocular or stereo) or frames from a video as input and
combines DROID-SLAM, point cloud registration, an off-the-shelf semantic
segmentation network, and Convolutional Bayesian Kernel Inference (ConvBKI) to
generate a 3D semantic map of the environment and provide accurate localization
for the robot. The key innovation of our method is the real-time fusion of
DROID-SLAM and Convolutional Bayesian Kernel Inference (ConvBKI), achieved
through the introduction of point cloud generation from RGB-Depth frames and
optimized camera poses. This integration, engineered to enable efficient and
timely processing, minimizes lag and ensures effective performance of the
system. Our approach facilitates functional real-time online semantic mapping
with just camera images or stereo video input. Our paper offers an open-source
Python implementation of the algorithm, available at
https://github.com/peterstratton/Volume-DROID.
Related papers
- SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos [33.57444419305241]
SLAM3R is a novel system for real-time, high-quality, dense 3D reconstruction using RGB videos.
It seamlessly integrates local 3D reconstruction and global coordinate registration through feed-forward neural networks.
It achieves state-of-the-art reconstruction accuracy and completeness while maintaining real-time performance at 20+ FPS.
arXiv Detail & Related papers (2024-12-12T16:08:03Z) - Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching [2.400446821380503]
We introduce an efficient framework to learn descriptors for both RGB images and point clouds.
It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy.
A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and RGB images for multi-view supervision.
arXiv Detail & Related papers (2024-10-08T18:31:41Z) - OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - Multiway Point Cloud Mosaicking with Diffusion and Global Optimization [74.3802812773891]
We introduce a novel framework for multiway point cloud mosaicking (named Wednesday)
At the core of our approach is ODIN, a learned pairwise registration algorithm that identifies overlaps and refines attention scores.
Tested on four diverse, large-scale datasets, our method state-of-the-art pairwise and rotation registration results by a large margin on all benchmarks.
arXiv Detail & Related papers (2024-03-30T17:29:13Z) - Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting [24.160436463991495]
We present a dense simultaneous localization and mapping (SLAM) method that uses 3D Gaussians as a scene representation.
Our approach enables interactive-time reconstruction and photo-realistic rendering from real-world single-camera RGBD videos.
arXiv Detail & Related papers (2023-12-06T10:47:53Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - CROSSFIRE: Camera Relocalization On Self-Supervised Features from an
Implicit Representation [3.565151496245487]
We use Neural Radiance Fields as an implicit map of a given scene and propose a camera relocalization tailored for this representation.
The proposed method enables to compute in real-time the precise position of a device using a single RGB camera, during its navigation.
arXiv Detail & Related papers (2023-03-08T20:22:08Z) - DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets [95.84755169585492]
We present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception.
Our model achieves state-of-the-art performance with a broad range of 3D perception tasks.
arXiv Detail & Related papers (2023-01-15T09:31:58Z) - Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range.
We focus on event-based visual odometry (VO)
We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z) - ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space.
Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z) - Real-time Dense Reconstruction of Tissue Surface from Stereo Optical
Video [10.181846237133167]
We propose an approach to reconstruct dense three-dimensional (3D) model of tissue surface from stereo optical videos in real-time.
The basic idea is to first extract 3D information from video frames by using stereo matching, and then to mosaic the reconstructed 3D models.
Experimental results on ex- and in vivo data showed that the reconstructed 3D models have high resolution texture with an accuracy error of less than 2 mm.
arXiv Detail & Related papers (2020-07-16T19:14:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.