Voxel Map for Visual SLAM
- URL: http://arxiv.org/abs/2003.02247v1
- Date: Wed, 4 Mar 2020 18:39:14 GMT
- Title: Voxel Map for Visual SLAM
- Authors: Manasi Muglikar, Zichao Zhang and Davide Scaramuzza
- Abstract summary: We propose a voxel-map representation to efficiently map points for visual SLAM.
Our method is geometrically guaranteed to fall in the camera field-of-view, and occluded points can be identified and removed to a certain extend.
Experimental results show that our voxel map representation is as efficient as a map with 5s and provides significantly higher localization accuracy (average 46% improvement in RMSE) on the EuRoC dataset.
- Score: 57.07800982410967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In modern visual SLAM systems, it is a standard practice to retrieve
potential candidate map points from overlapping keyframes for further feature
matching or direct tracking. In this work, we argue that keyframes are not the
optimal choice for this task, due to several inherent limitations, such as weak
geometric reasoning and poor scalability. We propose a voxel-map representation
to efficiently retrieve map points for visual SLAM. In particular, we organize
the map points in a regular voxel grid. Visible points from a camera pose are
queried by sampling the camera frustum in a raycasting manner, which can be
done in constant time using an efficient voxel hashing method. Compared with
keyframes, the retrieved points using our method are geometrically guaranteed
to fall in the camera field-of-view, and occluded points can be identified and
removed to a certain extend. This method also naturally scales up to large
scenes and complicated multicamera configurations. Experimental results show
that our voxel map representation is as efficient as a keyframe map with 5
keyframes and provides significantly higher localization accuracy (average 46%
improvement in RMSE) on the EuRoC dataset. The proposed voxel-map
representation is a general approach to a fundamental functionality in visual
SLAM and widely applicable.
Related papers
- FaVoR: Features via Voxel Rendering for Camera Relocalization [23.7893950095252]
Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image.
We propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features.
By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking.
arXiv Detail & Related papers (2024-09-11T18:58:16Z) - Representing 3D sparse map points and lines for camera relocalization [1.2974519529978974]
We show how a lightweight neural network can learn to represent both 3D point and line features.
In tests, our method secures a significant lead, marking the most considerable enhancement over state-of-the-art learning-based methodologies.
arXiv Detail & Related papers (2024-02-28T03:07:05Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Dense RGB SLAM with Neural Implicit Maps [34.37572307973734]
We present a dense RGB SLAM method with neural implicit map representation.
Our method simultaneously solves the camera motion and the neural implicit map by matching the rendered and input video frames.
Our method achieves favorable results than previous methods and even surpasses some recent RGB-D SLAM methods.
arXiv Detail & Related papers (2023-01-21T09:54:07Z) - HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D
Images [58.720142291102135]
We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment.
The dataset is based on the popular Habitat simulator, in which it is possible to generate indoor scenes using both own sensor data and open datasets.
arXiv Detail & Related papers (2022-12-30T12:20:56Z) - Pixel-Perfect Structure-from-Motion with Featuremetric Refinement [96.73365545609191]
We refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views.
This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors.
Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale.
arXiv Detail & Related papers (2021-08-18T17:58:55Z) - Object-Augmented RGB-D SLAM for Wide-Disparity Relocalisation [3.888848425698769]
We propose a novel object-augmented RGB-D SLAM system that is capable of constructing a consistent object map and performing relocalisation based on centroids of objects in the map.
arXiv Detail & Related papers (2021-08-05T11:02:25Z) - Accurate Grid Keypoint Learning for Efficient Video Prediction [87.71109421608232]
Keypoint-based video prediction methods can consume substantial computing resources in training and deployment.
In this paper, we design a new grid keypoint learning framework, aiming at a robust and explainable intermediate keypoint representation for long-term efficient video prediction.
Our method outperforms the state-ofthe-art video prediction methods while saves 98% more than computing resources.
arXiv Detail & Related papers (2021-07-28T05:04:30Z) - RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR
Point Cloud Segmentation [28.494690309193068]
We propose a novel range-point-voxel fusion network, namely RPVNet.
In this network, we devise a deep fusion framework with multiple and mutual information interactions among these three views.
By leveraging this efficient interaction and relatively lower voxel resolution, our method is also proved to be more efficient.
arXiv Detail & Related papers (2021-03-24T04:24:12Z) - Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement [54.29252286561449]
We propose a two-stage graph-based and model-agnostic framework, called Graph-PCNN.
In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled.
In the second stage, for each guided point, different visual feature is extracted by the localization.
The relationship between guided points is explored by the graph pose refinement module to get more accurate localization results.
arXiv Detail & Related papers (2020-07-21T04:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.