Related papers: DeepFusion: Real-Time Dense 3D Reconstruction for Monocular SLAM using Single-View Depth and Gradient Predictions

DeepFusion: Real-Time Dense 3D Reconstruction for Monocular SLAM using Single-View Depth and Gradient Predictions

URL: http://arxiv.org/abs/2207.12244v1
Date: Mon, 25 Jul 2022 14:55:26 GMT
Title: DeepFusion: Real-Time Dense 3D Reconstruction for Monocular SLAM using Single-View Depth and Gradient Predictions
Authors: Tristan Laidlow, Jan Czarnowski, Stefan Leutenegger
Abstract summary: DeepFusion is capable of producing real-time dense reconstructions on a GPU. It fuses the output of a semi-dense multiview stereo algorithm with the depth and predictions of a CNN in a probabilistic fashion. Based on its performance on synthetic and real-world datasets, we demonstrate that DeepFusion is capable of performing at least as well as other comparable systems.
Score: 22.243043857097582
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While the keypoint-based maps created by sparse monocular simultaneous localisation and mapping (SLAM) systems are useful for camera tracking, dense 3D reconstructions may be desired for many robotic tasks. Solutions involving depth cameras are limited in range and to indoor spaces, and dense reconstruction systems based on minimising the photometric error between frames are typically poorly constrained and suffer from scale ambiguity. To address these issues, we propose a 3D reconstruction system that leverages the output of a convolutional neural network (CNN) to produce fully dense depth maps for keyframes that include metric scale. Our system, DeepFusion, is capable of producing real-time dense reconstructions on a GPU. It fuses the output of a semi-dense multiview stereo algorithm with the depth and gradient predictions of a CNN in a probabilistic fashion, using learned uncertainties produced by the network. While the network only needs to be run once per keyframe, we are able to optimise for the depth map with each new frame so as to constantly make use of new geometric constraints. Based on its performance on synthetic and real-world datasets, we demonstrate that DeepFusion is capable of performing at least as well as other comparable systems.

Related papers

FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction. Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z)
SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences. It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping. Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z)
SimpleMapping: Real-Time Visual-Inertial Dense Mapping with Deep Multi-View Stereo [13.535871843518953]
We present a real-time visual-inertial dense mapping method with high quality using only monocular images and IMU readings. We propose a sparse point aided stereo neural network (SPA-MVSNet) that can effectively leverage the informative but noisy sparse points from the VIO system. Our proposed dense mapping system achieves a 39.7% improvement in F-score over existing systems when evaluated on the challenging scenarios of the EuRoC dataset.
arXiv Detail & Related papers (2023-06-14T17:28:45Z)
Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner. Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping. Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z)
SimpleRecon: 3D Reconstruction Without 3D Convolutions [21.952478592241]
We show how focusing on high quality multi-view depth prediction leads to highly accurate 3D reconstructions using simple off-the-shelf depth fusion. Our method achieves a significant lead over the current state-of-the-art for depth estimation and close or better for 3D reconstruction on ScanNet and 7-Scenes.
arXiv Detail & Related papers (2022-08-31T09:46:34Z)
Towards the Probabilistic Fusion of Learned Priors into Standard Pipelines for 3D Reconstruction [31.55322925389011]
We train a deep neural network to predict discrete, nonparametric probability distributions for the depth of each pixel from a single image. We then fuse this "probability volume" with another probability volume based on the photometric consistency between subsequent frames and the image.
arXiv Detail & Related papers (2022-07-27T11:28:49Z)
Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images. Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z)
GCNDepth: Self-supervised Monocular Depth Estimation based on Graph Convolutional Network [11.332580333969302]
This work brings a new solution with a set of improvements, which increase the quantitative and qualitative understanding of depth maps. A graph convolutional network (GCN) can handle the convolution on non-Euclidean data and it can be applied to irregular image regions within a topological structure. Our method provided comparable and promising results with a high prediction accuracy of 89% on the publicly KITTI and Make3D datasets.
arXiv Detail & Related papers (2021-12-13T16:46:25Z)
3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions. We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z)
Depth Completion Using a View-constrained Deep Prior [73.21559000917554]
Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images. This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting. We extend the concept of the DIP to depth images. Given color images and noisy and incomplete target depth maps, we reconstruct a depth map restored by virtue of using the CNN network structure as a prior.
arXiv Detail & Related papers (2020-01-21T21:56:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.