DiffusionDepth: Diffusion Denoising Approach for Monocular Depth
Estimation
- URL: http://arxiv.org/abs/2303.05021v4
- Date: Tue, 29 Aug 2023 05:20:36 GMT
- Title: DiffusionDepth: Diffusion Denoising Approach for Monocular Depth
Estimation
- Authors: Yiqun Duan, Xianda Guo, Zheng Zhu
- Abstract summary: DiffusionDepth is a new approach that reformulates monocular depth estimation as a denoising diffusion process.
It learns an iterative denoising process to denoise' random depth distribution into a depth map with the guidance of monocular visual conditions.
Experimental results on KITTI and NYU-Depth-V2 datasets suggest that a simple yet efficient diffusion approach could reach state-of-the-art performance in both indoor and outdoor scenarios with acceptable inference time.
- Score: 23.22005119986485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation is a challenging task that predicts the pixel-wise
depth from a single 2D image. Current methods typically model this problem as a
regression or classification task. We propose DiffusionDepth, a new approach
that reformulates monocular depth estimation as a denoising diffusion process.
It learns an iterative denoising process to `denoise' random depth distribution
into a depth map with the guidance of monocular visual conditions. The process
is performed in the latent space encoded by a dedicated depth encoder and
decoder. Instead of diffusing ground truth (GT) depth, the model learns to
reverse the process of diffusing the refined depth of itself into random depth
distribution. This self-diffusion formulation overcomes the difficulty of
applying generative models to sparse GT depth scenarios. The proposed approach
benefits this task by refining depth estimation step by step, which is superior
for generating accurate and highly detailed depth maps. Experimental results on
KITTI and NYU-Depth-V2 datasets suggest that a simple yet efficient diffusion
approach could reach state-of-the-art performance in both indoor and outdoor
scenarios with acceptable inference time.
Related papers
- Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian [49.21866794516328]
3D Gaussian splatting has demonstrated impressive performance in real-time novel view synthesis.
Previous approaches have incorporated depth supervision into the training of 3D Gaussians to mitigate overfitting.
We introduce a novel method to supervise the depth distribution of 3D Gaussians, utilizing depth priors with integrated uncertainty estimates.
arXiv Detail & Related papers (2024-05-30T03:18:30Z) - Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z) - MonoDiffusion: Self-Supervised Monocular Depth Estimation Using
Diffusion Model [17.68594761862957]
We introduce a novel self-supervised depth estimation framework, dubbed MonoDiffusion, by formulating it as an iterative denoising process.
Because the depth ground-truth is unavailable in the training phase, we develop a pseudo ground-truth diffusion process to assist the diffusion in MonoDiffusion.
The pseudo ground-truth diffusion gradually adds noise to the depth map generated by a pre-trained teacher model.
arXiv Detail & Related papers (2023-11-13T09:38:30Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Non-learning Stereo-aided Depth Completion under Mis-projection via
Selective Stereo Matching [0.5067618621449753]
We propose a non-learning depth completion method for a sparse depth map captured using a light detection and ranging (LiDAR) sensor guided by a pair of stereo images.
The proposed method reduced the mean absolute error (MAE) of the depth estimation to 0.65 times and demonstrated approximately twice more accurate estimation in the long range.
arXiv Detail & Related papers (2022-10-04T07:46:56Z) - RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation [27.679479140943503]
We propose a resolution adaptive self-supervised monocular depth estimation method (RA-Depth) by learning the scale invariance of the scene depth.
RA-Depth achieves state-of-the-art performance, and also exhibits a good ability of resolution adaptation.
arXiv Detail & Related papers (2022-07-25T08:49:59Z) - Non-parametric Depth Distribution Modelling based Depth Inference for
Multi-view Stereo [43.415242967722804]
Recent cost volume pyramid based deep neural networks have unlocked the potential of efficiently leveraging high-resolution images for depth inference from multi-view stereo.
In general, those approaches assume that the depth of each pixel follows a unimodal distribution.
We propose constructing the cost volume by non-parametric depth distribution modeling to handle pixels with unimodal and multi-modal distributions.
arXiv Detail & Related papers (2022-05-08T05:13:04Z) - End-to-end Learning for Joint Depth and Image Reconstruction from
Diffracted Rotation [10.896567381206715]
We propose a novel end-to-end learning approach for depth from diffracted rotation.
Our approach requires a significantly less complex model and less training data, yet it is superior to existing methods in the task of monocular depth estimation.
arXiv Detail & Related papers (2022-04-14T16:14:37Z) - Depth Completion using Plane-Residual Representation [84.63079529738924]
We introduce a novel way of interpreting depth information with the closest depth plane label $p$ and a residual value $r$, as we call it, Plane-Residual (PR) representation.
By interpreting depth information in PR representation and using our corresponding depth completion network, we were able to acquire improved depth completion performance with faster computation.
arXiv Detail & Related papers (2021-04-15T10:17:53Z) - Efficient Depth Completion Using Learned Bases [94.0808155168311]
We propose a new global geometry constraint for depth completion.
By assuming depth maps often lay on low dimensional subspaces, a dense depth map can be approximated by a weighted sum of full-resolution principal depth bases.
arXiv Detail & Related papers (2020-12-02T11:57:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.