GEDepth: Ground Embedding for Monocular Depth Estimation
- URL: http://arxiv.org/abs/2309.09975v1
- Date: Mon, 18 Sep 2023 17:56:06 GMT
- Title: GEDepth: Ground Embedding for Monocular Depth Estimation
- Authors: Xiaodong Yang, Zhuang Ma, Zhiyu Ji, Zhe Ren
- Abstract summary: This paper proposes a novel ground embedding module to decouple camera parameters from pictorial cues.
A ground attention is designed in the module to optimally combine ground depth with residual depth.
Experiments reveal that our approach achieves the state-of-the-art results on popular benchmarks.
- Score: 4.95394574147086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation is an ill-posed problem as the same 2D image can
be projected from infinite 3D scenes. Although the leading algorithms in this
field have reported significant improvement, they are essentially geared to the
particular compound of pictorial observations and camera parameters (i.e.,
intrinsics and extrinsics), strongly limiting their generalizability in
real-world scenarios. To cope with this challenge, this paper proposes a novel
ground embedding module to decouple camera parameters from pictorial cues, thus
promoting the generalization capability. Given camera parameters, the proposed
module generates the ground depth, which is stacked with the input image and
referenced in the final depth prediction. A ground attention is designed in the
module to optimally combine ground depth with residual depth. Our ground
embedding is highly flexible and lightweight, leading to a plug-in module that
is amenable to be integrated into various depth estimation networks.
Experiments reveal that our approach achieves the state-of-the-art results on
popular benchmarks, and more importantly, renders significant generalization
improvement on a wide range of cross-domain tests.
Related papers
- DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
We present DepthSplat to connect Gaussian splatting and depth estimation.
We first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features.
We also show that Gaussian splatting can serve as an unsupervised pre-training objective.
arXiv Detail & Related papers (2024-10-17T17:59:58Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - NDDepth: Normal-Distance Assisted Monocular Depth Estimation [22.37113584192617]
We propose a novel physics (geometry)-driven deep learning framework for monocular depth estimation.
We introduce a new normal-distance head that outputs pixel-level surface normal and plane-to-origin distance for deriving depth at each position.
We develop an effective contrastive iterative refinement module that refines depth in a complementary manner according to the depth uncertainty.
arXiv Detail & Related papers (2023-09-19T13:05:57Z) - ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive
depth range and depth interval [19.28042366225802]
Multi-View Stereo(MVS) is a fundamental problem in geometric computer vision.
We present a novel multi-stage coarse-to-fine framework to achieve adaptive all-pixel depth range and depth interval.
Our model achieves state-of-the-art performance and yields competitive generalization ability.
arXiv Detail & Related papers (2023-08-17T14:52:11Z) - A Simple Baseline for Supervised Surround-view Depth Estimation [25.81521612343612]
We propose S3Depth, a Simple Baseline for Supervised Surround-view Depth Estimation.
We employ a global-to-local feature extraction module which combines CNN with transformer layers for enriched representations.
Our method achieves superior performance over existing state-of-the-art methods on both DDAD and nuScenes datasets.
arXiv Detail & Related papers (2023-03-14T10:06:19Z) - Multi-Camera Collaborative Depth Prediction via Consistent Structure
Estimation [75.99435808648784]
We propose a novel multi-camera collaborative depth prediction method.
It does not require large overlapping areas while maintaining structure consistency between cameras.
Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2022-10-05T03:44:34Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - Facial Depth and Normal Estimation using Single Dual-Pixel Camera [81.02680586859105]
We introduce a DP-oriented Depth/Normal network that reconstructs the 3D facial geometry.
It contains the corresponding ground-truth 3D models including depth map and surface normal in metric scale.
It achieves state-of-the-art performances over recent DP-based depth/normal estimation methods.
arXiv Detail & Related papers (2021-11-25T05:59:27Z) - EdgeConv with Attention Module for Monocular Depth Estimation [4.239147046986999]
To generate accurate depth maps, it is important for the model to learn structural information about the scene.
We propose a novel Patch-Wise EdgeConv Module (PEM) and EdgeConv Attention Module (EAM) to solve the difficulty of monocular depth estimation.
Our method is evaluated on two popular datasets, the NYU Depth V2 and the KITTI split, achieving state-of-the-art performance.
arXiv Detail & Related papers (2021-06-16T08:15:20Z) - Video Depth Estimation by Fusing Flow-to-Depth Proposals [65.24533384679657]
We present an approach with a differentiable flow-to-depth layer for video depth estimation.
The model consists of a flow-to-depth layer, a camera pose refinement module, and a depth fusion network.
Our approach outperforms state-of-the-art depth estimation methods, and has reasonable cross dataset generalization capability.
arXiv Detail & Related papers (2019-12-30T10:45:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.