MultiDepth: Multi-Sample Priors for Refining Monocular Metric Depth Estimations in Indoor Scenes
- URL: http://arxiv.org/abs/2411.01048v1
- Date: Fri, 01 Nov 2024 21:30:51 GMT
- Title: MultiDepth: Multi-Sample Priors for Refining Monocular Metric Depth Estimations in Indoor Scenes
- Authors: Sanghyun Byun, Jacob Song, Woo Seong Chung,
- Abstract summary: Existing models are sensitive to factors such as boundary frequency of objects in the scene and scene complexity.
We propose a solution by taking samples of the image along with the initial depth map prediction made by a pre-trained MMDE model.
Compared to existing iterative depth refinement techniques, MultiDepth does not employ normal map prediction as part of its architecture.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular metric depth estimation (MMDE) is a crucial task to solve for indoor scene reconstruction on edge devices. Despite this importance, existing models are sensitive to factors such as boundary frequency of objects in the scene and scene complexity, failing to fully capture many indoor scenes. In this work, we propose to close this gap through the task of monocular metric depth refinement (MMDR) by leveraging state-of-the-art MMDE models. MultiDepth proposes a solution by taking samples of the image along with the initial depth map prediction made by a pre-trained MMDE model. Compared to existing iterative depth refinement techniques, MultiDepth does not employ normal map prediction as part of its architecture, effectively lowering the model size and computation overhead while outputting impactful changes from refining iterations. MultiDepth implements a lightweight encoder-decoder architecture for the refinement network, processing multiple samples from the given image, including segmentation masking. We evaluate MultiDepth on four datasets and compare them to state-of-the-art methods to demonstrate its effective refinement with minimal overhead, displaying accuracy improvement upward of 45%.
Related papers
- Depth Anything with Any Prior [64.39991799606146]
Prior Depth Anything is a framework that combines incomplete but precise metric information in depth measurement with relative but complete geometric structures in depth prediction.<n>We develop a conditioned monocular depth estimation (MDE) model to refine the inherent noise of depth priors.<n>Our model showcases impressive zero-shot generalization across depth completion, super-resolution, and inpainting over 7 real-world datasets.
arXiv Detail & Related papers (2025-05-15T17:59:50Z) - UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler [62.06785782635153]
We propose a new model, UniDepthV2, capable of reconstructing metric 3D scenes from solely single images across domains.
UniDepthV2 directly predicts metric 3D points from the input image at inference time without any additional information.
Our model exploits a pseudo-spherical output representation, which disentangles the camera and depth representations.
arXiv Detail & Related papers (2025-02-27T14:03:15Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - Q-SLAM: Quadric Representations for Monocular SLAM [85.82697759049388]
We reimagine volumetric representations through the lens of quadrics.
We use quadric assumption to rectify noisy depth estimations from RGB inputs.
We introduce a novel quadric-decomposed transformer to aggregate information across quadrics.
arXiv Detail & Related papers (2024-03-12T23:27:30Z) - Joint Depth Prediction and Semantic Segmentation with Multi-View SAM [59.99496827912684]
We propose a Multi-View Stereo (MVS) technique for depth prediction that benefits from rich semantic features of the Segment Anything Model (SAM)
This enhanced depth prediction, in turn, serves as a prompt to our Transformer-based semantic segmentation decoder.
arXiv Detail & Related papers (2023-10-31T20:15:40Z) - ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive
depth range and depth interval [19.28042366225802]
Multi-View Stereo(MVS) is a fundamental problem in geometric computer vision.
We present a novel multi-stage coarse-to-fine framework to achieve adaptive all-pixel depth range and depth interval.
Our model achieves state-of-the-art performance and yields competitive generalization ability.
arXiv Detail & Related papers (2023-08-17T14:52:11Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Multi-Camera Collaborative Depth Prediction via Consistent Structure
Estimation [75.99435808648784]
We propose a novel multi-camera collaborative depth prediction method.
It does not require large overlapping areas while maintaining structure consistency between cameras.
Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2022-10-05T03:44:34Z) - RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation [27.679479140943503]
We propose a resolution adaptive self-supervised monocular depth estimation method (RA-Depth) by learning the scale invariance of the scene depth.
RA-Depth achieves state-of-the-art performance, and also exhibits a good ability of resolution adaptation.
arXiv Detail & Related papers (2022-07-25T08:49:59Z) - Multi-View Depth Estimation by Fusing Single-View Depth Probability with
Multi-View Geometry [25.003116148843525]
We propose MaGNet, a framework for fusing single-view depth probability with multi-view geometry.
MaGNet achieves state-of-the-art performance on ScanNet, 7-Scenes and KITTI.
arXiv Detail & Related papers (2021-12-15T14:56:53Z) - Differentiable Diffusion for Dense Depth Estimation from Multi-view
Images [31.941861222005603]
We present a method to estimate dense depth by optimizing a sparse set of points such that their diffusion into a depth map minimizes a multi-view reprojection error from RGB supervision.
We also develop an efficient optimization routine that can simultaneously optimize the 50k+ points required for complex scene reconstruction.
arXiv Detail & Related papers (2021-06-16T16:17:34Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.