Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion
- URL: http://arxiv.org/abs/2406.09782v1
- Date: Fri, 14 Jun 2024 07:31:20 GMT
- Title: Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion
- Authors: Runze Liu, Dongchen Zhu, Guanghui Zhang, Yue Xu, Wenjun Shi, Xiaolin Zhang, Lei Wang, Jiamao Li,
- Abstract summary: Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth.
We employ a well-converging diffusion model among generative networks for unsupervised monocular depth estimation.
This model significantly enriches the model's capacity for learning and interpreting depth distribution.
- Score: 21.939618694037108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth. In real-world scenarios, the images may be blurry or noisy due to the influence of weather conditions and inherent limitations of the camera. Therefore, it is particularly important to develop a robust depth estimation model. Benefiting from the training strategies of generative networks, generative-based methods often exhibit enhanced robustness. In light of this, we employ a well-converging diffusion model among generative networks for unsupervised monocular depth estimation. Additionally, we propose a hierarchical feature-guided denoising module. This model significantly enriches the model's capacity for learning and interpreting depth distribution by fully leveraging image features to guide the denoising process. Furthermore, we explore the implicit depth within reprojection and design an implicit depth consistency loss. This loss function serves to enhance the performance of the model and ensure the scale consistency of depth within a video sequence. We conduct experiments on the KITTI, Make3D, and our self-collected SIMIT datasets. The results indicate that our approach stands out among generative-based models, while also showcasing remarkable robustness.
Related papers
- Structure-Centric Robust Monocular Depth Estimation via Knowledge Distillation [9.032563775151074]
Monocular depth estimation is a key technique for 3D perception in computer vision.
It faces significant challenges in real-world scenarios, which encompass adverse weather variations, motion blur, as well as scenes with poor lighting conditions at night.
We devise a novel approach to reduce over-reliance on local textures, enhancing robustness against missing or interfering patterns.
arXiv Detail & Related papers (2024-10-09T15:20:29Z) - Digging into contrastive learning for robust depth estimation with diffusion models [55.62276027922499]
We propose a novel robust depth estimation method called D4RD.
It features a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments.
In experiments, D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions.
arXiv Detail & Related papers (2024-04-15T14:29:47Z) - Adaptive Discrete Disparity Volume for Self-supervised Monocular Depth Estimation [0.0]
In this paper, we propose a learnable module, Adaptive Discrete Disparity Volume (ADDV)
ADDV is capable of dynamically sensing depth distributions in different RGB images and generating adaptive bins for them.
We also introduce novel training strategies - uniformizing and sharpening - to provide regularizations under self-supervised conditions.
arXiv Detail & Related papers (2024-04-04T04:22:25Z) - Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation [33.140210057065644]
This paper introduces a novel approach named Stealing Stable Diffusion (SSD) prior for robust monocular depth estimation.
The approach addresses this limitation by utilizing stable diffusion to generate synthetic images that mimic challenging conditions.
The effectiveness of the approach is evaluated on nuScenes and Oxford RobotCar, two challenging public datasets.
arXiv Detail & Related papers (2024-03-08T05:06:31Z) - Depth-aware Volume Attention for Texture-less Stereo Matching [67.46404479356896]
We propose a lightweight volume refinement scheme to tackle the texture deterioration in practical outdoor scenarios.
We introduce a depth volume supervised by the ground-truth depth map, capturing the relative hierarchy of image texture.
Local fine structure and context are emphasized to mitigate ambiguity and redundancy during volume aggregation.
arXiv Detail & Related papers (2024-02-14T04:07:44Z) - The Surprising Effectiveness of Diffusion Models for Optical Flow and
Monocular Depth Estimation [42.48819460873482]
Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity.
We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions.
arXiv Detail & Related papers (2023-06-02T21:26:20Z) - Monocular Depth Estimation using Diffusion Models [39.27361388836347]
We introduce innovations to address problems arising due to noisy, incomplete depth maps in training data.
To cope with the limited availability of data for supervised training, we leverage pre-training on self-supervised image-to-image translation tasks.
Our DepthGen model achieves SOTA performance on the indoor NYU dataset, and near SOTA results on the outdoor KITTI dataset.
arXiv Detail & Related papers (2023-02-28T18:08:21Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods.
The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps.
Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.