Neural Implicit Dense Semantic SLAM
- URL: http://arxiv.org/abs/2304.14560v2
- Date: Tue, 9 May 2023 13:58:15 GMT
- Title: Neural Implicit Dense Semantic SLAM
- Authors: Yasaman Haghighi, Suryansh Kumar, Jean-Philippe Thiran, Luc Van Gool
- Abstract summary: We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
- Score: 83.04331351572277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Simultaneous Localization and Mapping (vSLAM) is a widely used
technique in robotics and computer vision that enables a robot to create a map
of an unfamiliar environment using a camera sensor while simultaneously
tracking its position over time. In this paper, we propose a novel RGBD vSLAM
algorithm that can learn a memory-efficient, dense 3D geometry, and semantic
segmentation of an indoor scene in an online manner. Our pipeline combines
classical 3D vision-based tracking and loop closing with neural fields-based
mapping. The mapping network learns the SDF of the scene as well as RGB, depth,
and semantic maps of any novel view using only a set of keyframes.
Additionally, we extend our pipeline to large scenes by using multiple local
mapping networks. Extensive experiments on well-known benchmark datasets
confirm that our approach provides robust tracking, mapping, and semantic
labeling even with noisy, sparse, or no input depth. Overall, our proposed
algorithm can greatly enhance scene perception and assist with a range of robot
control problems.
Related papers
- SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation [50.420711084672966]
We present SliceOcc, an RGB camera-based model specifically tailored for indoor 3D semantic occupancy prediction.
Experimental results on the EmbodiedScan dataset demonstrate that SliceOcc achieves a mIoU of 15.45% across 81 indoor categories.
arXiv Detail & Related papers (2025-01-28T03:41:24Z) - Volumetric Mapping with Panoptic Refinement via Kernel Density Estimation for Mobile Robots [2.8668675011182967]
Mobile robots usually use lightweight networks to segment objects on RGB images and then localize them via depth maps.
We address the problem of panoptic segmentation quality in 3D scene reconstruction by refining segmentation errors using non-parametric statistical methods.
We map the predicted masks into a depth frame to estimate their distribution via kernel densities.
The outliers in depth perception are then rejected without the need for additional parameters.
arXiv Detail & Related papers (2024-12-15T16:46:23Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - DeepFusion: Real-Time Dense 3D Reconstruction for Monocular SLAM using
Single-View Depth and Gradient Predictions [22.243043857097582]
DeepFusion is capable of producing real-time dense reconstructions on a GPU.
It fuses the output of a semi-dense multiview stereo algorithm with the depth and predictions of a CNN in a probabilistic fashion.
Based on its performance on synthetic and real-world datasets, we demonstrate that DeepFusion is capable of performing at least as well as other comparable systems.
arXiv Detail & Related papers (2022-07-25T14:55:26Z) - Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from
Depth Maps [66.24554680709417]
Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications.
We propose a non-invasive framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera.
arXiv Detail & Related papers (2022-07-06T08:52:12Z) - Self-supervised Video Representation Learning by Uncovering
Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem.
We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion.
A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z) - Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image.
Our neural network processes a depth image and outputs a likelihood map in image coordinates.
We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.