SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple
Cameras and Scenes by One Model
- URL: http://arxiv.org/abs/2403.08556v1
- Date: Wed, 13 Mar 2024 14:08:25 GMT
- Title: SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple
Cameras and Scenes by One Model
- Authors: Yihao Liu and Feng Xue and Anlong Ming
- Abstract summary: This paper proposes SM4Depth, a seamless MMDE method to address all the issues above within a single network.
First, we reveal that a consistent field of view (FOV) is the key to resolve metric ambiguity'' across cameras.
Second, to achieve consistently high accuracy across scenes, we explicitly model the metric scale determination as discretizing the depth interval into bins.
Third, to reduce the reliance on massive training data, we propose a divide and conquer" solution.
- Score: 23.95095404136943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The generalization of monocular metric depth estimation (MMDE) has been a
longstanding challenge. Recent methods made progress by combining relative and
metric depth or aligning input image focal length. However, they are still
beset by challenges in camera, scene, and data levels: (1) Sensitivity to
different cameras; (2) Inconsistent accuracy across scenes; (3) Reliance on
massive training data. This paper proposes SM4Depth, a seamless MMDE method, to
address all the issues above within a single network. First, we reveal that a
consistent field of view (FOV) is the key to resolve ``metric ambiguity''
across cameras, which guides us to propose a more straightforward preprocessing
unit. Second, to achieve consistently high accuracy across scenes, we
explicitly model the metric scale determination as discretizing the depth
interval into bins and propose variation-based unnormalized depth bins. This
method bridges the depth gap of diverse scenes by reducing the ambiguity of the
conventional metric bin. Third, to reduce the reliance on massive training
data, we propose a ``divide and conquer" solution. Instead of estimating
directly from the vast solution space, the correct metric bins are estimated
from multiple solution sub-spaces for complexity reduction. Finally, with just
150K RGB-D pairs and a consumer-grade GPU for training, SM4Depth achieves
state-of-the-art performance on most previously unseen datasets, especially
surpassing ZoeDepth and Metric3D on mRI$_\theta$. The code can be found at
https://github.com/1hao-Liu/SM4Depth.
Related papers
- UniDepth: Universal Monocular Metric Depth Estimation [81.80512457953903]
We propose a new model, UniDepth, capable of reconstructing metric 3D scenes from solely single images across domains.
Our model exploits a pseudo-spherical output representation, which disentangles camera and depth representations.
Thorough evaluations on ten datasets in a zero-shot regime consistently demonstrate the superior performance of UniDepth.
arXiv Detail & Related papers (2024-03-27T18:06:31Z) - Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation [74.28509379811084]
Metric3D v2 is a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image.
We propose solutions for both metric depth estimation and surface normal estimation.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2024-03-22T02:30:46Z) - Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image [85.91935485902708]
We show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models.
We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2023-07-20T16:14:23Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - CrossDTR: Cross-view and Depth-guided Transformers for 3D Object
Detection [10.696619570924778]
We propose Cross-view and Depth-guided Transformers for 3D Object Detection, CrossDTR.
Our method hugely surpassed existing multi-camera methods by 10 percent in pedestrian detection and about 3 percent in overall mAP and NDS metrics.
arXiv Detail & Related papers (2022-09-27T16:23:12Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Generalized Binary Search Network for Highly-Efficient Multi-View Stereo [10.367295443948487]
Multi-view Stereo (MVS) with known camera parameters is essentially a 1D search problem within a valid depth range.
Recent deep learning-based MVS methods typically densely sample depth hypotheses in the depth range.
We propose a novel method for highly efficient MVS that remarkably decreases the memory footprint.
arXiv Detail & Related papers (2021-12-04T13:57:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.