Boosting Monocular Depth Estimation Models to High-Resolution via
Content-Adaptive Multi-Resolution Merging
- URL: http://arxiv.org/abs/2105.14021v1
- Date: Fri, 28 May 2021 17:55:15 GMT
- Title: Boosting Monocular Depth Estimation Models to High-Resolution via
Content-Adaptive Multi-Resolution Merging
- Authors: S. Mahdi H. Miangoleh, Sebastian Dille, Long Mai, Sylvain Paris,
Ya\u{g}{\i}z Aksoy
- Abstract summary: We show how a consistent scene structure and high-frequency details affect depth estimation performance.
We present a double estimation method that improves the whole-image depth estimation and a patch selection method that adds local details.
We demonstrate that by merging estimations at different resolutions with changing context, we can generate multi-megapixel depth maps with a high level of detail.
- Score: 14.279471205248534
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Neural networks have shown great abilities in estimating depth from a single
image. However, the inferred depth maps are well below one-megapixel resolution
and often lack fine-grained details, which limits their practicality. Our
method builds on our analysis on how the input resolution and the scene
structure affects depth estimation performance. We demonstrate that there is a
trade-off between a consistent scene structure and the high-frequency details,
and merge low- and high-resolution estimations to take advantage of this
duality using a simple depth merging network. We present a double estimation
method that improves the whole-image depth estimation and a patch selection
method that adds local details to the final result. We demonstrate that by
merging estimations at different resolutions with changing context, we can
generate multi-megapixel depth maps with a high level of detail using a
pre-trained model.
Related papers
- UDPNet: Unleashing Depth-based Priors for Robust Image Dehazing [77.10640210751981]
UDPNet is a general framework that leverages depth-based priors from a large-scale pretrained depth estimation model DepthAnything V2.<n>Our proposed solution establishes a new benchmark for depth-aware dehazing across various scenarios.
arXiv Detail & Related papers (2026-01-11T13:29:02Z) - InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields [62.49846959186119]
This paper introduces InfiniDepth, which represents depth as neural implicit fields.<n>We can query depth at continuous 2D coordinates, enabling arbitrary-resolution and fine-grained depth estimation.<n>InfiniDepth achieves state-of-the-art performance on both synthetic and real-world benchmarks.
arXiv Detail & Related papers (2026-01-06T18:57:06Z) - One Look is Enough: A Novel Seamless Patchwise Refinement for Zero-Shot Monocular Depth Estimation Models on High-Resolution Images [25.48185527420231]
We propose Patch Refine Once (PRO), an efficient and generalizable tile-based framework.
Our PRO consists of two key components: (i) Grouped Patch Consistency Training that enhances test-time efficiency while mitigating the depth discontinuity problem.
Our PRO can be well harmonized, making their DE capabilities still effective for the grid input of high-resolution images with little depth discontinuities at the grid boundaries.
arXiv Detail & Related papers (2025-03-28T11:46:50Z) - High-Precision Dichotomous Image Segmentation via Depth Integrity-Prior and Fine-Grained Patch Strategy [23.431898388115044]
High-precision dichotomous image segmentation (DIS) is a task of extracting fine-grained objects from high-resolution images.<n>Existing methods face a dilemma: non-diffusion methods work efficiently but suffer from false or missed detections due to weak semantics.<n>We find pseudo depth information from monocular depth estimation models can provide essential semantic understanding.
arXiv Detail & Related papers (2025-03-08T07:02:28Z) - Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering [6.372979654151044]
Current state-of-the-art monocular depth estimators, trained on extensive datasets, generalize well but lack 3D consistency needed for many applications.<n>In this paper, we combine the strength of those generalizing monocular depth estimation techniques with multi-view data by framing this as an analysis-by-synthesis optimization problem.<n>Our method is able to generate detailed, high-quality, view consistent, accurate depth maps, also in challenging indoor scenarios, and outperforms state-of-the-art multi-view depth reconstruction approaches on such datasets.
arXiv Detail & Related papers (2024-10-04T18:50:28Z) - Self-supervised Monocular Depth Estimation with Large Kernel Attention [30.44895226042849]
We propose a self-supervised monocular depth estimation network to get finer details.
Specifically, we propose a decoder based on large kernel attention, which can model long-distance dependencies.
Our method achieves competitive results on the KITTI dataset.
arXiv Detail & Related papers (2024-09-26T14:44:41Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Multi-resolution Monocular Depth Map Fusion by Self-supervised
Gradient-based Composition [14.246972408737987]
We propose a novel depth map fusion module to combine the advantages of estimations with multi-resolution inputs.
Our lightweight depth fusion is one-shot and runs in real-time, making our method 80X faster than a state-of-the-art depth fusion method.
arXiv Detail & Related papers (2022-12-03T05:13:50Z) - Multi-Camera Collaborative Depth Prediction via Consistent Structure
Estimation [75.99435808648784]
We propose a novel multi-camera collaborative depth prediction method.
It does not require large overlapping areas while maintaining structure consistency between cameras.
Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2022-10-05T03:44:34Z) - RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation [27.679479140943503]
We propose a resolution adaptive self-supervised monocular depth estimation method (RA-Depth) by learning the scale invariance of the scene depth.
RA-Depth achieves state-of-the-art performance, and also exhibits a good ability of resolution adaptation.
arXiv Detail & Related papers (2022-07-25T08:49:59Z) - Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched
Data [73.9872931307401]
We propose a novel weakly-supervised framework to train a monocular depth estimation network.
The proposed framework is composed of a sharing weight monocular depth estimation network and a depth reconstruction network for distillation.
Experimental results demonstrate that our method achieves superior performance than unsupervised and semi-supervised learning based schemes.
arXiv Detail & Related papers (2021-09-23T18:04:12Z) - Differentiable Diffusion for Dense Depth Estimation from Multi-view
Images [31.941861222005603]
We present a method to estimate dense depth by optimizing a sparse set of points such that their diffusion into a depth map minimizes a multi-view reprojection error from RGB supervision.
We also develop an efficient optimization routine that can simultaneously optimize the 50k+ points required for complex scene reconstruction.
arXiv Detail & Related papers (2021-06-16T16:17:34Z) - Towards Unpaired Depth Enhancement and Super-Resolution in the Wild [121.96527719530305]
State-of-the-art data-driven methods of depth map super-resolution rely on registered pairs of low- and high-resolution depth maps of the same scenes.
We consider an approach to depth map enhancement based on learning from unpaired data.
arXiv Detail & Related papers (2021-05-25T16:19:16Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.