Deep Depth Estimation from Visual-Inertial SLAM
- URL: http://arxiv.org/abs/2008.00092v2
- Date: Fri, 14 Aug 2020 22:00:36 GMT
- Title: Deep Depth Estimation from Visual-Inertial SLAM
- Authors: Kourosh Sartipi, Tien Do, Tong Ke, Khiem Vuong, Stergios I.
Roumeliotis
- Abstract summary: We study the case in which the sparse depth is computed from a visual-inertial simultaneous localization and mapping (VI-SLAM) system.
The resulting point cloud has low density, it is noisy, and has non-uniform spatial distribution.
We use the available gravity estimate from the VI-SLAM to warp the input image to the orientation prevailing in the training dataset.
- Score: 11.814395824799988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of learning to complete a scene's depth from
sparse depth points and images of indoor scenes. Specifically, we study the
case in which the sparse depth is computed from a visual-inertial simultaneous
localization and mapping (VI-SLAM) system. The resulting point cloud has low
density, it is noisy, and has non-uniform spatial distribution, as compared to
the input from active depth sensors, e.g., LiDAR or Kinect. Since the VI-SLAM
produces point clouds only over textured areas, we compensate for the missing
depth of the low-texture surfaces by leveraging their planar structures and
their surface normals which is an important intermediate representation. The
pre-trained surface normal network, however, suffers from large performance
degradation when there is a significant difference in the viewing direction
(especially the roll angle) of the test image as compared to the trained ones.
To address this limitation, we use the available gravity estimate from the
VI-SLAM to warp the input image to the orientation prevailing in the training
dataset. This results in a significant performance gain for the surface normal
estimate, and thus the dense depth estimates. Finally, we show that our method
outperforms other state-of-the-art approaches both on training (ScanNet and
NYUv2) and testing (collected with Azure Kinect) datasets.
Related papers
- Depth Insight -- Contribution of Different Features to Indoor
Single-image Depth Estimation [8.712751056826283]
We quantify the relative contributions of the known cues of depth in a monocular depth estimation setting.
Our work uses feature extraction techniques to relate the single features of shape, texture, colour and saturation, taken in isolation, to predict depth.
arXiv Detail & Related papers (2023-11-16T17:38:21Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Depth Monocular Estimation with Attention-based Encoder-Decoder Network
from Single Image [7.753378095194288]
Vision-based approaches have recently received much attention and can overcome these drawbacks.
In this work, we explore an extreme scenario in vision-based settings: estimate a depth map from one monocular image severely plagued by grid artifacts and blurry edges.
Our novel approach can find the focus of current image with minimal overhead and avoid losses of depth features.
arXiv Detail & Related papers (2022-10-24T23:01:25Z) - IronDepth: Iterative Refinement of Single-View Depth using Surface
Normal and its Uncertainty [24.4764181300196]
We introduce a novel framework that uses surface normal and its uncertainty to recurrently refine the predicted depth-map.
The proposed method shows state-of-the-art performance on NYUv2 and iBims-1 - both in terms of depth and normal.
arXiv Detail & Related papers (2022-10-07T16:34:20Z) - Visual Attention-based Self-supervised Absolute Depth Estimation using
Geometric Priors in Autonomous Driving [8.045833295463094]
We introduce a fully Visual Attention-based Depth (VADepth) network, where spatial attention and channel attention are applied to all stages.
By continuously extracting the dependencies of features along the spatial and channel dimensions over a long distance, VADepth network can effectively preserve important details.
Experimental results on the KITTI dataset show that this architecture achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-05-18T08:01:38Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth [83.77839773394106]
We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system.
We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction.
We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
arXiv Detail & Related papers (2020-12-18T09:42:54Z) - Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-D saliency detection shows impressive ability on some challenge scenarios.
We propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way.
arXiv Detail & Related papers (2020-07-23T04:33:36Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.