Related papers: Neural Implicit Dense Semantic SLAM

Neural Implicit Dense Semantic SLAM

URL: http://arxiv.org/abs/2304.14560v2
Date: Tue, 9 May 2023 13:58:15 GMT
Title: Neural Implicit Dense Semantic SLAM
Authors: Yasaman Haghighi, Suryansh Kumar, Jean-Philippe Thiran, Luc Van Gool
Abstract summary: We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner. Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping. Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
Score: 83.04331351572277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual Simultaneous Localization and Mapping (vSLAM) is a widely used technique in robotics and computer vision that enables a robot to create a map of an unfamiliar environment using a camera sensor while simultaneously tracking its position over time. In this paper, we propose a novel RGBD vSLAM algorithm that can learn a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner. Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping. The mapping network learns the SDF of the scene as well as RGB, depth, and semantic maps of any novel view using only a set of keyframes. Additionally, we extend our pipeline to large scenes by using multiple local mapping networks. Extensive experiments on well-known benchmark datasets confirm that our approach provides robust tracking, mapping, and semantic labeling even with noisy, sparse, or no input depth. Overall, our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.

Related papers

Learning 3D Scene Analogies with Neural Contextual Scene Maps [17.545689536966265]
3D scene analogies are smooth maps between 3D scene regions that align spatial relationships.<n>We propose neural contextual scene maps, which extract descriptor fields summarizing semantic and geometric contexts.<n>Experiments demonstrate the effectiveness of our approach in identifying scene analogies and transferring trajectories in diverse indoor scenes.
arXiv Detail & Related papers (2025-03-20T06:49:33Z)
SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation [50.420711084672966]
We present SliceOcc, an RGB camera-based model specifically tailored for indoor 3D semantic occupancy prediction. Experimental results on the EmbodiedScan dataset demonstrate that SliceOcc achieves a mIoU of 15.45% across 81 indoor categories.
arXiv Detail & Related papers (2025-01-28T03:41:24Z)
Volumetric Mapping with Panoptic Refinement via Kernel Density Estimation for Mobile Robots [2.8668675011182967]
Mobile robots usually use lightweight networks to segment objects on RGB images and then localize them via depth maps. We address the problem of panoptic segmentation quality in 3D scene reconstruction by refining segmentation errors using non-parametric statistical methods. We map the predicted masks into a depth frame to estimate their distribution via kernel densities. The outliers in depth perception are then rejected without the need for additional parameters.
arXiv Detail & Related papers (2024-12-15T16:46:23Z)
Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations [8.522160106746478]
We present a global visual localization system capable of localizing a single camera image across various 3D map representations. Our system generates a database by synthesizing novel views of the scene, creating RGB and depth image pairs. NeRF synthesized images show superior performance, localizing query images at an average success rate of 72%.
arXiv Detail & Related papers (2024-08-21T19:37:17Z)
ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames. Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z)
SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences. It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping. Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z)
DeepFusion: Real-Time Dense 3D Reconstruction for Monocular SLAM using Single-View Depth and Gradient Predictions [22.243043857097582]
DeepFusion is capable of producing real-time dense reconstructions on a GPU. It fuses the output of a semi-dense multiview stereo algorithm with the depth and predictions of a CNN in a probabilistic fashion. Based on its performance on synthetic and real-world datasets, we demonstrate that DeepFusion is capable of performing at least as well as other comparable systems.
arXiv Detail & Related papers (2022-07-25T14:55:26Z)
Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from Depth Maps [66.24554680709417]
Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications. We propose a non-invasive framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera.
arXiv Detail & Related papers (2022-07-06T08:52:12Z)
Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem. We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion. A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z)
Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image. Our neural network processes a depth image and outputs a likelihood map in image coordinates. We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z)
OmniSLAM: Omnidirectional Localization and Dense Mapping for Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras. For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation. We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.