Related papers: SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model

SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model

URL: http://arxiv.org/abs/2403.08556v1
Date: Wed, 13 Mar 2024 14:08:25 GMT
Title: SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model
Authors: Yihao Liu and Feng Xue and Anlong Ming
Abstract summary: This paper proposes SM4Depth, a seamless MMDE method to address all the issues above within a single network. First, we reveal that a consistent field of view (FOV) is the key to resolve metric ambiguity'' across cameras. Second, to achieve consistently high accuracy across scenes, we explicitly model the metric scale determination as discretizing the depth interval into bins. Third, to reduce the reliance on massive training data, we propose a divide and conquer" solution.
Score: 23.95095404136943
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The generalization of monocular metric depth estimation (MMDE) has been a longstanding challenge. Recent methods made progress by combining relative and metric depth or aligning input image focal length. However, they are still beset by challenges in camera, scene, and data levels: (1) Sensitivity to different cameras; (2) Inconsistent accuracy across scenes; (3) Reliance on massive training data. This paper proposes SM4Depth, a seamless MMDE method, to address all the issues above within a single network. First, we reveal that a consistent field of view (FOV) is the key to resolve ``metric ambiguity'' across cameras, which guides us to propose a more straightforward preprocessing unit. Second, to achieve consistently high accuracy across scenes, we explicitly model the metric scale determination as discretizing the depth interval into bins and propose variation-based unnormalized depth bins. This method bridges the depth gap of diverse scenes by reducing the ambiguity of the conventional metric bin. Third, to reduce the reliance on massive training data, we propose a ``divide and conquer" solution. Instead of estimating directly from the vast solution space, the correct metric bins are estimated from multiple solution sub-spaces for complexity reduction. Finally, with just 150K RGB-D pairs and a consumer-grade GPU for training, SM4Depth achieves state-of-the-art performance on most previously unseen datasets, especially surpassing ZoeDepth and Metric3D on mRI$_\theta$. The code can be found at https://github.com/1hao-Liu/SM4Depth.

Related papers

ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth. Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module. Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z)
UniDepth: Universal Monocular Metric Depth Estimation [81.80512457953903]
We propose a new model, UniDepth, capable of reconstructing metric 3D scenes from solely single images across domains. Our model exploits a pseudo-spherical output representation, which disentangles camera and depth representations. Thorough evaluations on ten datasets in a zero-shot regime consistently demonstrate the superior performance of UniDepth.
arXiv Detail & Related papers (2024-03-27T18:06:31Z)
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model [34.85279074665031]
Methods for monocular depth estimation have made significant strides on standard benchmarks, but zero-shot metric depth estimation remains unsolved. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. We advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization.
arXiv Detail & Related papers (2023-12-20T18:27:47Z)
NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation [58.21817572577012]
Video depth estimation aims to infer temporally consistent depth. We introduce NVDS+ that stabilizes inconsistent depth estimated by various single-image models in a plug-and-play manner. We also elaborate a large-scale Video Depth in the Wild dataset, which contains 14,203 videos with over two million frames.
arXiv Detail & Related papers (2023-07-17T17:57:01Z)
RayMVSNet++: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo [21.209964556493368]
RayMVSNet learns sequential prediction of a 1D implicit field along each camera ray with the zero-crossing point indicating scene depth. RayMVSNet++ achieves state-of-the-art performance on the ScanNet dataset.
arXiv Detail & Related papers (2023-07-16T02:10:47Z)
Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image. We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z)
DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data [110.29043712400912]
We present a method for depth estimation with monocular images, which can predict high-quality depth on diverse scenes up to an affine transformation. Experiments show that our method outperforms previous methods on 8 datasets by a large margin with the zero-shot test setting.
arXiv Detail & Related papers (2020-02-03T05:38:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.