Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion
- URL: http://arxiv.org/abs/2012.10296v1
- Date: Fri, 18 Dec 2020 15:19:46 GMT
- Title: Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion
- Authors: Lam Huynh, Phong Nguyen, Jiri Matas, Esa Rahtu, Janne Heikkila
- Abstract summary: We address the problem of fusing monocular depth estimation with a conventional multi-view stereo or SLAM.
We use a conventional pipeline to produce a sparse 3D point cloud that is fed to a monocular depth estimation network to enhance its performance.
We demonstrate the efficacy of our approach by integrating it with a SLAM system built-in on mobile devices.
- Score: 46.97673710849343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the problem of fusing monocular depth estimation
with a conventional multi-view stereo or SLAM to exploit the best of both
worlds, that is, the accurate dense depth of the first one and lightweightness
of the second one. More specifically, we use a conventional pipeline to produce
a sparse 3D point cloud that is fed to a monocular depth estimation network to
enhance its performance. In this way, we can achieve accuracy similar to
multi-view stereo with a considerably smaller number of weights. We also show
that even as few as 32 points is sufficient to outperform the best monocular
depth estimation methods, and around 200 points to gain full advantage of the
additional information. Moreover, we demonstrate the efficacy of our approach
by integrating it with a SLAM system built-in on mobile devices.
Related papers
- DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
We present DepthSplat to connect Gaussian splatting and depth estimation.
We first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features.
We also show that Gaussian splatting can serve as an unsupervised pre-training objective.
arXiv Detail & Related papers (2024-10-17T17:59:58Z) - MonoCD: Monocular 3D Object Detection with Complementary Depths [9.186673054867866]
Depth estimation is an essential but challenging subtask of monocular 3D object detection.
We propose to increase the complementarity of depths with two novel designs.
Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data.
arXiv Detail & Related papers (2024-04-04T03:30:49Z) - NDDepth: Normal-Distance Assisted Monocular Depth Estimation [22.37113584192617]
We propose a novel physics (geometry)-driven deep learning framework for monocular depth estimation.
We introduce a new normal-distance head that outputs pixel-level surface normal and plane-to-origin distance for deriving depth at each position.
We develop an effective contrastive iterative refinement module that refines depth in a complementary manner according to the depth uncertainty.
arXiv Detail & Related papers (2023-09-19T13:05:57Z) - Probabilistic Volumetric Fusion for Dense Monocular SLAM [33.156523309257786]
We present a novel method to reconstruct 3D scenes by leveraging deep dense monocular SLAM and fast uncertainty propagation.
The proposed approach is able to 3D reconstruct scenes densely, accurately, and in real-time while being robust to extremely noisy depth estimates.
We show that our approach achieves 92% better accuracy than directly fusing depths from monocular SLAM, and up to 90% improvements compared to the best competing approach.
arXiv Detail & Related papers (2022-10-03T23:53:35Z) - Uncertainty Guided Depth Fusion for Spike Camera [49.41822923588663]
We propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse predictions of monocular and stereo depth estimation networks for spike camera.
Our framework is motivated by the fact that stereo spike depth estimation achieves better results at close range.
In order to demonstrate the advantage of spike depth estimation over traditional camera depth estimation, we contribute a spike-depth dataset named CitySpike20K.
arXiv Detail & Related papers (2022-08-26T13:04:01Z) - Scale-aware direct monocular odometry [4.111899441919165]
We present a framework for direct monocular odometry based on depth prediction from a deep neural network.
Our proposal largely outperforms classic monocular SLAM, being 5 to 9 times more precise, with an accuracy which is closer to that of stereo systems.
arXiv Detail & Related papers (2021-09-21T10:30:15Z) - Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction [72.30870535815258]
CNNs for monocular depth prediction represent two largely disjoint approaches towards building a 3D map of the surrounding environment.
We propose a joint narrow and wide baseline based self-improving framework, where on the one hand the CNN-predicted depth is leveraged to perform pseudo RGB-D feature-based SLAM.
On the other hand, the bundle-adjusted 3D scene structures and camera poses from the more principled geometric SLAM are injected back into the depth network through novel wide baseline losses.
arXiv Detail & Related papers (2020-04-22T16:31:59Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z) - D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual
Odometry [57.5549733585324]
D3VO is a novel framework for monocular visual odometry that exploits deep networks on three levels -- deep depth, pose and uncertainty estimation.
We first propose a novel self-supervised monocular depth estimation network trained on stereo videos without any external supervision.
We model the photometric uncertainties of pixels on the input images, which improves the depth estimation accuracy.
arXiv Detail & Related papers (2020-03-02T17:47:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.