HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors
- URL: http://arxiv.org/abs/2407.18443v2
- Date: Mon, 28 Oct 2024 23:54:10 GMT
- Title: HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors
- Authors: Ashkan Ganj, Hang Su, Tian Guo,
- Abstract summary: We propose HYBRIDDEPTH, a robust depth estimation pipeline that addresses key challenges in depth estimation.
We test our pipeline as an end-to-end system, with a newly developed mobile client to capture focal stacks, which are then sent to a GPU-powered server for depth estimation.
Comprehensive quantitative and qualitative analyses demonstrate that HYBRIDDEPTH outperforms state-of-the-art(SOTA) models on common datasets.
- Score: 10.88048563201236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose HYBRIDDEPTH, a robust depth estimation pipeline that addresses key challenges in depth estimation,including scale ambiguity, hardware heterogeneity, and generalizability. HYBRIDDEPTH leverages focal stack, data conveniently accessible in common mobile devices, to produce accurate metric depth maps. By incorporating depth priors afforded by recent advances in singleimage depth estimation, our model achieves a higher level of structural detail compared to existing methods. We test our pipeline as an end-to-end system, with a newly developed mobile client to capture focal stacks, which are then sent to a GPU-powered server for depth estimation. Comprehensive quantitative and qualitative analyses demonstrate that HYBRIDDEPTH outperforms state-of-the-art(SOTA) models on common datasets such as DDFF12 and NYU Depth V2. HYBRIDDEPTH also shows strong zero-shot generalization. When trained on NYU Depth V2, HYBRIDDEPTH surpasses SOTA models in zero-shot performance on ARKitScenes and delivers more structurally accurate depth maps on Mobile Depth.
Related papers
- DepthLab: From Partial to Complete [80.58276388743306]
Missing values remain a common challenge for depth data across its wide range of applications.
This work bridges this gap with DepthLab, a foundation depth inpainting model powered by image diffusion priors.
Our approach proves its worth in various downstream tasks, including 3D scene inpainting, text-to-3D scene generation, sparse-view reconstruction with DUST3R, and LiDAR depth completion.
arXiv Detail & Related papers (2024-12-24T04:16:38Z) - Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation [108.04354143020886]
We introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything.
We use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution.
arXiv Detail & Related papers (2024-12-18T16:32:12Z) - DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
We present DepthSplat to connect Gaussian splatting and depth estimation.
We first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features.
We also show that Gaussian splatting can serve as an unsupervised pre-training objective.
arXiv Detail & Related papers (2024-10-17T17:59:58Z) - Depth Pro: Sharp Monocular Metric Depth in Less Than a Second [45.6690958201871]
We present a foundation model for zero-shot metric monocular depth estimation.
Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details.
It produces a 2.25-megapixel depth map in 0.3 seconds on a standard GPU.
arXiv Detail & Related papers (2024-10-02T22:42:20Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations [23.0962036039182]
"Optimization-Guided Neural Iterations" (OGNI) is a novel framework for depth completion.
OGNI-DC exhibits strong generalization, outperforming baselines on unseen datasets and across various sparsity levels.
It has high accuracy, achieving state-of-the-art performance on the NYUv2 and the KITTI benchmarks.
arXiv Detail & Related papers (2024-06-17T16:30:29Z) - Metrically Scaled Monocular Depth Estimation through Sparse Priors for
Underwater Robots [0.0]
We formulate a deep learning model that fuses sparse depth measurements from triangulated features to improve the depth predictions.
The network is trained in a supervised fashion on the forward-looking underwater dataset, FLSea.
The method achieves real-time performance, running at 160 FPS on a laptop GPU and 7 FPS on a single CPU core.
arXiv Detail & Related papers (2023-10-25T16:32:31Z) - GEDepth: Ground Embedding for Monocular Depth Estimation [4.95394574147086]
This paper proposes a novel ground embedding module to decouple camera parameters from pictorial cues.
A ground attention is designed in the module to optimally combine ground depth with residual depth.
Experiments reveal that our approach achieves the state-of-the-art results on popular benchmarks.
arXiv Detail & Related papers (2023-09-18T17:56:06Z) - Deep Neighbor Layer Aggregation for Lightweight Self-Supervised
Monocular Depth Estimation [1.6775954077761863]
We present a fully convolutional depth estimation network using contextual feature fusion.
Compared to UNet++ and HRNet, we use high-resolution and low-resolution features to reserve information on small targets and fast-moving objects.
Our method reduces the parameters without sacrificing accuracy.
arXiv Detail & Related papers (2023-09-17T13:40:15Z) - NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation [58.21817572577012]
Video depth estimation aims to infer temporally consistent depth.
We introduce NVDS+ that stabilizes inconsistent depth estimated by various single-image models in a plug-and-play manner.
We also elaborate a large-scale Video Depth in the Wild dataset, which contains 14,203 videos with over two million frames.
arXiv Detail & Related papers (2023-07-17T17:57:01Z) - Self-Supervised Learning based Depth Estimation from Monocular Images [0.0]
The goal of Monocular Depth Estimation is to predict the depth map, given a 2D monocular RGB image as input.
We plan to do intrinsic camera parameters during training and apply weather augmentations to further generalize our model.
arXiv Detail & Related papers (2023-04-14T07:14:08Z) - Monocular Visual-Inertial Depth Estimation [66.71452943981558]
We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry.
Our approach performs global scale and shift alignment against sparse metric depth, followed by learning-based dense alignment.
We evaluate on the TartanAir and VOID datasets, observing up to 30% reduction in RMSE with dense scale alignment.
arXiv Detail & Related papers (2023-03-21T18:47:34Z) - Uncertainty Guided Depth Fusion for Spike Camera [49.41822923588663]
We propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse predictions of monocular and stereo depth estimation networks for spike camera.
Our framework is motivated by the fact that stereo spike depth estimation achieves better results at close range.
In order to demonstrate the advantage of spike depth estimation over traditional camera depth estimation, we contribute a spike-depth dataset named CitySpike20K.
arXiv Detail & Related papers (2022-08-26T13:04:01Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation [14.81943833870932]
We present an improvedDepthNet, HR-Depth, with two effective strategies.
Using Resnet-18 as the encoder, HR-Depth surpasses all pre-vious state-of-the-art(SoTA) methods with the least param-eters at both high and low resolution.
arXiv Detail & Related papers (2020-12-14T09:15:15Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Efficient Depth Completion Using Learned Bases [94.0808155168311]
We propose a new global geometry constraint for depth completion.
By assuming depth maps often lay on low dimensional subspaces, a dense depth map can be approximated by a weighted sum of full-resolution principal depth bases.
arXiv Detail & Related papers (2020-12-02T11:57:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.