VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation
- URL: http://arxiv.org/abs/2504.15095v3
- Date: Sun, 27 Apr 2025 09:34:34 GMT
- Title: VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation
- Authors: Mingxia Zhan, Li Zhang, Xiaomeng Chu, Beibei Wang,
- Abstract summary: VistaDepth is a novel framework that integrates adaptive frequency-domain feature enhancements with an adaptive weight-balancing mechanism.<n> VistaDepth achieves state-of-the-art performance among diffusion-based MDE techniques, particularly excelling in the accurate reconstruction of distant regions.
- Score: 8.66253032039513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation (MDE) aims to predict per-pixel depth values from a single RGB image. Recent advancements have positioned diffusion models as effective MDE tools by framing the challenge as a conditional image generation task. Despite their progress, these methods often struggle with accurately reconstructing distant depths, due largely to the imbalanced distribution of depth values and an over-reliance on spatial-domain features. To overcome these limitations, we introduce VistaDepth, a novel framework that integrates adaptive frequency-domain feature enhancements with an adaptive weight-balancing mechanism into the diffusion process. Central to our approach is the Latent Frequency Modulation (LFM) module, which dynamically refines spectral responses in the latent feature space, thereby improving the preservation of structural details and reducing noisy artifacts. Furthermore, we implement an adaptive weighting strategy that modulates the diffusion loss in real-time, enhancing the model's sensitivity towards distant depth reconstruction. These innovations collectively result in superior depth perception performance across both distance and detail. Experimental evaluations confirm that VistaDepth achieves state-of-the-art performance among diffusion-based MDE techniques, particularly excelling in the accurate reconstruction of distant regions.
Related papers
- Quaternion Wavelet-Conditioned Diffusion Models for Image Super-Resolution [4.307648859471193]
We introduce ResQu, a novel SR framework that integrates a quaternion wavelet preprocessing framework with latent diffusion models.
Our approach enhances the conditioning process by exploiting quaternion wavelet embeddings, which are dynamically integrated at different stages of denoising.
Our method achieves outstanding SR results, outperforming in many cases existing approaches in perceptual quality and standard evaluation metrics.
arXiv Detail & Related papers (2025-05-01T06:17:33Z) - FUSION: Frequency-guided Underwater Spatial Image recOnstructioN [0.0]
Underwater images suffer from severe degradations, including color distortions, reduced visibility, and loss of structural details due to wavelength-dependent attenuation and scattering.
Existing enhancement methods primarily focus on spatial-domain processing, neglecting the frequency domain's potential to capture global color distributions and long-range dependencies.
We propose fusion, a dual-domain deep learning framework that jointly leverages spatial and frequency domain information.
arXiv Detail & Related papers (2025-04-01T23:16:19Z) - FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation [31.06080108012735]
We propose an efficient Monocular Depth Estimation (MDE) approach named FiffDepth.<n>FiffDepth transforms diffusion-based image generators into a feed-forward architecture for detailed depth estimation.<n>We demonstrate that FiffDepth achieves exceptional accuracy, stability, and fine-grained detail, offering significant improvements in MDE performance.
arXiv Detail & Related papers (2024-12-01T04:59:34Z) - Self-supervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion [16.673178271652553]
Self-supervised monocular depth estimation has received widespread attention because of its capability to train without ground truth.<n>We employ the generative-based diffusion model with a unique denoising training process for self-supervised monocular depth estimation.<n>We conduct experiments on the KITTI and Make3D datasets.
arXiv Detail & Related papers (2024-06-14T07:31:20Z) - Digging into contrastive learning for robust depth estimation with diffusion models [55.62276027922499]
We propose a novel robust depth estimation method called D4RD.
It features a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments.
In experiments, D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions.
arXiv Detail & Related papers (2024-04-15T14:29:47Z) - Adaptive Discrete Disparity Volume for Self-supervised Monocular Depth Estimation [0.0]
In this paper, we propose a learnable module, Adaptive Discrete Disparity Volume (ADDV)<n>ADDV is capable of dynamically sensing depth distributions in different RGB images and generating adaptive bins for them.<n>We also introduce novel training strategies - uniformizing and sharpening - to provide regularizations under self-supervised conditions.
arXiv Detail & Related papers (2024-04-04T04:22:25Z) - Generating Content for HDR Deghosting from Frequency View [56.103761824603644]
Recent Diffusion Models (DMs) have been introduced in HDR imaging field.
DMs require extensive iterations with large models to estimate entire images.
We propose the Low-Frequency aware Diffusion (LF-Diff) model for ghost-free HDR imaging.
arXiv Detail & Related papers (2024-04-01T01:32:11Z) - LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models [54.93010869546011]
We propose to leverage the pre-trained latent diffusion model to perform the neural ISP for enhancing extremely low-light images.<n>Specifically, to tailor the pre-trained latent diffusion model to operate on the RAW domain, we train a set of lightweight taming modules.<n>We observe different roles of UNet denoising and decoder reconstruction in the latent diffusion model, which inspires us to decompose the low-light image enhancement task into latent-space low-frequency content generation and decoding-phase high-frequency detail maintenance.
arXiv Detail & Related papers (2023-12-02T04:31:51Z) - Diffusion Models Without Attention [110.5623058129782]
Diffusion State Space Model (DiffuSSM) is an architecture that supplants attention mechanisms with a more scalable state space model backbone.
Our focus on FLOP-efficient architectures in diffusion training marks a significant step forward.
arXiv Detail & Related papers (2023-11-30T05:15:35Z) - Spatial Attention-based Distribution Integration Network for Human Pose
Estimation [0.8052382324386398]
We present the Spatial Attention-based Distribution Integration Network (SADI-NET) to improve the accuracy of localization.
Our network consists of three efficient models: the receptive fortified module (RFM), spatial fusion module (SFM), and distribution learning module (DLM)
Our model obtained a remarkable $92.10%$ percent accuracy on the MPII test dataset, demonstrating significant improvements over existing models and establishing state-of-the-art performance.
arXiv Detail & Related papers (2023-11-09T12:43:01Z) - Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Medical Image Reconstruction [75.91471250967703]
We introduce a novel sampling framework called Steerable Conditional Diffusion.
This framework adapts the diffusion model, concurrently with image reconstruction, based solely on the information provided by the available measurement.
We achieve substantial enhancements in out-of-distribution performance across diverse imaging modalities.
arXiv Detail & Related papers (2023-08-28T08:47:06Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.