Digging into contrastive learning for robust depth estimation with diffusion models
- URL: http://arxiv.org/abs/2404.09831v4
- Date: Sun, 22 Sep 2024 08:23:56 GMT
- Title: Digging into contrastive learning for robust depth estimation with diffusion models
- Authors: Jiyuan Wang, Chunyu Lin, Lang Nie, Kang Liao, Shuwei Shao, Yao Zhao,
- Abstract summary: We propose a novel robust depth estimation method called D4RD.
It features a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments.
In experiments, D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions.
- Score: 55.62276027922499
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, diffusion-based depth estimation methods have drawn widespread attention due to their elegant denoising patterns and promising performance. However, they are typically unreliable under adverse conditions prevalent in real-world scenarios, such as rainy, snowy, etc. In this paper, we propose a novel robust depth estimation method called D4RD, featuring a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments. Concretely, we integrate the strength of knowledge distillation into contrastive learning, building the `trinity' contrastive scheme. This scheme utilizes the sampled noise of the forward diffusion process as a natural reference, guiding the predicted noise in diverse scenes toward a more stable and precise optimum. Moreover, we extend noise-level trinity to encompass more generic feature and image levels, establishing a multi-level contrast to distribute the burden of robust perception across the overall network. Before addressing complex scenarios, we enhance the stability of the baseline diffusion model with three straightforward yet effective improvements, which facilitate convergence and remove depth outliers. Extensive experiments demonstrate that D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions. Source code and data are available at \url{https://github.com/wangjiyuan9/D4RD}.
Related papers
- Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion [21.939618694037108]
Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth.
We employ a well-converging diffusion model among generative networks for unsupervised monocular depth estimation.
This model significantly enriches the model's capacity for learning and interpreting depth distribution.
arXiv Detail & Related papers (2024-06-14T07:31:20Z) - Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation [33.140210057065644]
This paper introduces a novel approach named Stealing Stable Diffusion (SSD) prior for robust monocular depth estimation.
The approach addresses this limitation by utilizing stable diffusion to generate synthetic images that mimic challenging conditions.
The effectiveness of the approach is evaluated on nuScenes and Oxford RobotCar, two challenging public datasets.
arXiv Detail & Related papers (2024-03-08T05:06:31Z) - Learn to Optimize Denoising Scores for 3D Generation: A Unified and
Improved Diffusion Prior on NeRF and 3D Gaussian Splatting [60.393072253444934]
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks.
We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation.
arXiv Detail & Related papers (2023-12-08T03:55:34Z) - Global Structure-Aware Diffusion Process for Low-Light Image Enhancement [64.69154776202694]
This paper studies a diffusion-based framework to address the low-light image enhancement problem.
We advocate for the regularization of its inherent ODE-trajectory.
Experimental evaluations reveal that the proposed framework attains distinguished performance in low-light enhancement.
arXiv Detail & Related papers (2023-10-26T17:01:52Z) - Seismic Data Interpolation via Denoising Diffusion Implicit Models with Coherence-corrected Resampling [7.755439545030289]
Deep learning models such as U-Net often underperform when the training and test missing patterns do not match.
We propose a novel framework that is built upon the multi-modal diffusion models.
Inference phase, we introduce the denoising diffusion implicit model to reduce the number of sampling steps.
To enhance the coherence and continuity between the revealed traces and the missing traces, we propose two strategies.
arXiv Detail & Related papers (2023-07-09T16:37:47Z) - The Surprising Effectiveness of Diffusion Models for Optical Flow and
Monocular Depth Estimation [42.48819460873482]
Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity.
We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions.
arXiv Detail & Related papers (2023-06-02T21:26:20Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.