Related papers: CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth

CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth

URL: http://arxiv.org/abs/2012.10133v3
Date: Fri, 19 May 2023 09:12:35 GMT
Title: CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth
Authors: Xingxing Zuo, Nathaniel Merrill, Wei Li, Yong Liu, Marc Pollefeys, Guoquan Huang
Abstract summary: We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system. We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction. We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
Score: 83.77839773394106
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we present a lightweight, tightly-coupled deep depth network and visual-inertial odometry (VIO) system, which can provide accurate state estimates and dense depth maps of the immediate surroundings. Leveraging the proposed lightweight Conditional Variational Autoencoder (CVAE) for depth inference and encoding, we provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction and generalization capability. The compact encoded depth maps are then updated jointly with navigation states in a sliding window estimator in order to provide the dense local scene geometry. We additionally propose a novel method to obtain the CVAE's Jacobian which is shown to be more than an order of magnitude faster than previous works, and we additionally leverage First-Estimate Jacobian (FEJ) to avoid recalculation. As opposed to previous works relying on completely dense residuals, we propose to only provide sparse measurements to update the depth code and show through careful experimentation that our choice of sparse measurements and FEJs can still significantly improve the estimated depth maps. Our full system also exhibits state-of-the-art pose estimation accuracy, and we show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.

Related papers

Depth Anything with Any Prior [64.39991799606146]
Prior Depth Anything is a framework that combines incomplete but precise metric information in depth measurement with relative but complete geometric structures in depth prediction.<n>We develop a conditioned monocular depth estimation (MDE) model to refine the inherent noise of depth priors.<n>Our model showcases impressive zero-shot generalization across depth completion, super-resolution, and inpainting over 7 real-world datasets.
arXiv Detail & Related papers (2025-05-15T17:59:50Z)
Temporal Lidar Depth Completion [0.08192907805418582]
We show how a state-of-the-art method PENet can be modified to benefit from recurrency. Our algorithm achieves state-of-the-art results on the KITTI depth completion dataset.
arXiv Detail & Related papers (2024-06-17T08:25:31Z)
Metrically Scaled Monocular Depth Estimation through Sparse Priors for Underwater Robots [0.0]
We formulate a deep learning model that fuses sparse depth measurements from triangulated features to improve the depth predictions. The network is trained in a supervised fashion on the forward-looking underwater dataset, FLSea. The method achieves real-time performance, running at 160 FPS on a laptop GPU and 7 FPS on a single CPU core.
arXiv Detail & Related papers (2023-10-25T16:32:31Z)
Monocular Visual-Inertial Depth Estimation [66.71452943981558]
We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry. Our approach performs global scale and shift alignment against sparse metric depth, followed by learning-based dense alignment. We evaluate on the TartanAir and VOID datasets, observing up to 30% reduction in RMSE with dense scale alignment.
arXiv Detail & Related papers (2023-03-21T18:47:34Z)
VA-DepthNet: A Variational Approach to Single Image Depth Prediction [163.14849753700682]
VA-DepthNet is a simple, effective, and accurate deep neural network approach for the single-image depth prediction problem. The paper demonstrates the usefulness of the proposed approach via extensive evaluation and ablation analysis over several benchmark datasets.
arXiv Detail & Related papers (2023-02-13T17:55:58Z)
Lightweight Monocular Depth Estimation with an Edge Guided Network [34.03711454383413]
We present a novel lightweight Edge Guided Depth Estimation Network (EGD-Net) In particular, we start out with a lightweight encoder-decoder architecture and embed an edge guidance branch. In order to aggregate the context information and edge attention features, we design a transformer-based feature aggregation module.
arXiv Detail & Related papers (2022-09-29T14:45:47Z)
Depth Completion using Plane-Residual Representation [84.63079529738924]
We introduce a novel way of interpreting depth information with the closest depth plane label $p$ and a residual value $r$, as we call it, Plane-Residual (PR) representation. By interpreting depth information in PR representation and using our corresponding depth completion network, we were able to acquire improved depth completion performance with faster computation.
arXiv Detail & Related papers (2021-04-15T10:17:53Z)
Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars. In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors. We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z)
PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net. Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z)
Deep Multi-view Depth Estimation with Predicted Uncertainty [11.012201499666503]
We employ a dense-optical-flow network to compute correspondences and then triangulate the point cloud to obtain an initial depth map. To further increase the triangulation accuracy, we introduce a depth-refinement network (DRN) that optimize the initial depth map based on the image's contextual cues.
arXiv Detail & Related papers (2020-11-19T00:22:09Z)
Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning. Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples. We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.