Related papers: Digging into contrastive learning for robust depth estimation with diffusion models

Digging into contrastive learning for robust depth estimation with diffusion models

URL: http://arxiv.org/abs/2404.09831v4
Date: Sun, 22 Sep 2024 08:23:56 GMT
Title: Digging into contrastive learning for robust depth estimation with diffusion models
Authors: Jiyuan Wang, Chunyu Lin, Lang Nie, Kang Liao, Shuwei Shao, Yao Zhao,
Abstract summary: We propose a novel robust depth estimation method called D4RD. It features a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments. In experiments, D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions.
Score: 55.62276027922499
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, diffusion-based depth estimation methods have drawn widespread attention due to their elegant denoising patterns and promising performance. However, they are typically unreliable under adverse conditions prevalent in real-world scenarios, such as rainy, snowy, etc. In this paper, we propose a novel robust depth estimation method called D4RD, featuring a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments. Concretely, we integrate the strength of knowledge distillation into contrastive learning, building the `trinity' contrastive scheme. This scheme utilizes the sampled noise of the forward diffusion process as a natural reference, guiding the predicted noise in diverse scenes toward a more stable and precise optimum. Moreover, we extend noise-level trinity to encompass more generic feature and image levels, establishing a multi-level contrast to distribute the burden of robust perception across the overall network. Before addressing complex scenarios, we enhance the stability of the baseline diffusion model with three straightforward yet effective improvements, which facilitate convergence and remove depth outliers. Extensive experiments demonstrate that D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions. Source code and data are available at \url{https://github.com/wangjiyuan9/D4RD}.

Related papers

A View-consistent Sampling Method for Regularized Training of Neural Radiance Fields [46.03878128080501]
We propose to employ view-consistent distributions instead of fixed depth value estimations to regularize NeRF training.<n>By sampling from the view-consistency distributions, an implicit regularization is imposed on the training of NeRF.<n>We also utilize a depth-pushing loss that works in conjunction with the sampling technique to jointly provide effective regularizations for eliminating the failure modes.
arXiv Detail & Related papers (2025-07-06T14:48:10Z)
Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z)
Beyond Classification: Evaluating Diffusion Denoised Smoothing for Security-Utility Trade off [4.497768222083102]
Diffusion Denoised Smoothing is emerging as a promising technique to enhance model robustness.<n>We analyze three datasets with four distinct downstream tasks under three different adversarial attack algorithms.<n>High-noise diffusion denoising to clean images without any distortions significantly degrades performance by as high as 57%.<n>We introduce a novel attack strategy specifically targeting the diffusion process itself, capable of circumventing defenses in the low-noise regime.
arXiv Detail & Related papers (2025-05-21T14:49:24Z)
BUFF: Bayesian Uncertainty Guided Diffusion Probabilistic Model for Single Image Super-Resolution [19.568467335629094]
We introduce the Bayesian Uncertainty Guided Diffusion Probabilistic Model (BUFF) BUFF distinguishes itself by incorporating a Bayesian network to generate high-resolution uncertainty masks. It significantly mitigates artifacts and blurring in areas characterized by complex textures and fine details.
arXiv Detail & Related papers (2025-04-04T14:43:45Z)
One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step. To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration. Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
Generalized Diffusion Model with Adjusted Offset Noise [1.7767466724342067]
We propose a generalized diffusion model that naturally incorporates additional noise within a rigorous probabilistic framework. We derive a loss function based on the evidence lower bound, establishing its theoretical equivalence to offset noise with certain adjustments. Experiments on synthetic datasets demonstrate that our model effectively addresses brightness-related challenges and outperforms conventional methods in high-dimensional scenarios.
arXiv Detail & Related papers (2024-12-04T08:57:03Z)
Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion [21.939618694037108]
Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth. We employ a well-converging diffusion model among generative networks for unsupervised monocular depth estimation. This model significantly enriches the model's capacity for learning and interpreting depth distribution.
arXiv Detail & Related papers (2024-06-14T07:31:20Z)
DepthFM: Fast Monocular Depth Estimation with Flow Matching [22.206355073676082]
Current discriminative depth estimation methods often produce blurry artifacts, while generative approaches suffer from slow sampling due to curvatures in the noise-to-depth transport. Our method addresses these challenges by framing depth estimation as a direct transport between image and depth distributions. Our approach achieves competitive zero-shot performance on standard benchmarks of complex natural scenes while improving sampling efficiency and only requiring minimal synthetic data for training.
arXiv Detail & Related papers (2024-03-20T17:51:53Z)
Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation [33.140210057065644]
This paper introduces a novel approach named Stealing Stable Diffusion (SSD) prior for robust monocular depth estimation. The approach addresses this limitation by utilizing stable diffusion to generate synthetic images that mimic challenging conditions. The effectiveness of the approach is evaluated on nuScenes and Oxford RobotCar, two challenging public datasets.
arXiv Detail & Related papers (2024-03-08T05:06:31Z)
Learn to Optimize Denoising Scores for 3D Generation: A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting [60.393072253444934]
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation.
arXiv Detail & Related papers (2023-12-08T03:55:34Z)
Global Structure-Aware Diffusion Process for Low-Light Image Enhancement [64.69154776202694]
This paper studies a diffusion-based framework to address the low-light image enhancement problem. We advocate for the regularization of its inherent ODE-trajectory. Experimental evaluations reveal that the proposed framework attains distinguished performance in low-light enhancement.
arXiv Detail & Related papers (2023-10-26T17:01:52Z)
Seismic Data Interpolation via Denoising Diffusion Implicit Models with Coherence-corrected Resampling [7.755439545030289]
Deep learning models such as U-Net often underperform when the training and test missing patterns do not match. We propose a novel framework that is built upon the multi-modal diffusion models. Inference phase, we introduce the denoising diffusion implicit model to reduce the number of sampling steps. To enhance the coherence and continuity between the revealed traces and the missing traces, we propose two strategies.
arXiv Detail & Related papers (2023-07-09T16:37:47Z)
The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation [42.48819460873482]
Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions.
arXiv Detail & Related papers (2023-06-02T21:26:20Z)
Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration. We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z)
Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images. Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations. Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z)
Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance. We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.