DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain
- URL: http://arxiv.org/abs/2410.14980v2
- Date: Tue, 22 Oct 2024 14:27:32 GMT
- Title: DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain
- Authors: Kun Wang, Zhiqiang Yan, Junkai Fan, Wanlu Zhu, Xiang Li, Jun Li, Jian Yang,
- Abstract summary: DCDepth is a novel framework for the long-standing monocular depth estimation task.
It estimates the frequency coefficients of depth patches after transforming them into the discrete cosine domain.
We conduct comprehensive experiments on NYU-Depth-V2, TOFDC, and KITTI datasets, and demonstrate the state-of-the-art performance of DCDepth.
- Score: 20.55626048513748
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we introduce DCDepth, a novel framework for the long-standing monocular depth estimation task. Moving beyond conventional pixel-wise depth estimation in the spatial domain, our approach estimates the frequency coefficients of depth patches after transforming them into the discrete cosine domain. This unique formulation allows for the modeling of local depth correlations within each patch. Crucially, the frequency transformation segregates the depth information into various frequency components, with low-frequency components encapsulating the core scene structure and high-frequency components detailing the finer aspects. This decomposition forms the basis of our progressive strategy, which begins with the prediction of low-frequency components to establish a global scene context, followed by successive refinement of local details through the prediction of higher-frequency components. We conduct comprehensive experiments on NYU-Depth-V2, TOFDC, and KITTI datasets, and demonstrate the state-of-the-art performance of DCDepth. Code is available at https://github.com/w2kun/DCDepth.
Related papers
- DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion [59.25479674775212]
DepR is a depth-guided single-view scene reconstruction framework.<n>It generates individual objects and composes them into a coherent 3D layout.<n>It achieves state-of-the-art performance despite being trained on limited synthetic data.
arXiv Detail & Related papers (2025-07-30T16:40:46Z) - Learning Inverse Laplacian Pyramid for Progressive Depth Completion [18.977393635158048]
LP-Net is an innovative framework that implements a multi-scale, progressive prediction paradigm based on Laplacian Pyramid decomposition.
At the time of submission, LP-Net ranks 1st among all peer-reviewed methods on the official KITTI leaderboard.
arXiv Detail & Related papers (2025-02-11T06:21:42Z) - DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
We present DepthSplat to connect Gaussian splatting and depth estimation.
We first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features.
We also show that Gaussian splatting can serve as an unsupervised pre-training objective.
arXiv Detail & Related papers (2024-10-17T17:59:58Z) - D-PAD: Deep-Shallow Multi-Frequency Patterns Disentangling for Time Series Forecasting [7.447606231770597]
We propose D-PAD, a deep-shallow multi-frequency patterns disentangling neural network for time series forecasting.
D-PAD achieves the state-of-the-art performance, outperforming the best baseline by an average of 9.48% and 7.15% in MSE and MAE, respectively.
arXiv Detail & Related papers (2024-03-26T15:52:36Z) - Q-SLAM: Quadric Representations for Monocular SLAM [85.82697759049388]
We reimagine volumetric representations through the lens of quadrics.
We use quadric assumption to rectify noisy depth estimations from RGB inputs.
We introduce a novel quadric-decomposed transformer to aggregate information across quadrics.
arXiv Detail & Related papers (2024-03-12T23:27:30Z) - Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.
Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.
Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z) - Neural Kernel Surface Reconstruction [80.51581494300423]
We present a novel method for reconstructing a 3D implicit surface from a large-scale, sparse, and noisy point cloud.
Our approach builds upon the recently introduced Neural Kernel Fields representation.
arXiv Detail & Related papers (2023-05-31T06:25:18Z) - DARF: Depth-Aware Generalizable Neural Radiance Field [51.29437249009986]
We propose the Depth-Aware Generalizable Neural Radiance Field (DARF) with a Depth-Aware Dynamic Sampling (DADS) strategy.
Our framework infers the unseen scenes on both pixel level and geometry level with only a few input images.
Compared with state-of-the-art generalizable NeRF methods, DARF reduces samples by 50%, while improving rendering quality and depth estimation.
arXiv Detail & Related papers (2022-12-05T14:00:59Z) - Multi-Camera Collaborative Depth Prediction via Consistent Structure
Estimation [75.99435808648784]
We propose a novel multi-camera collaborative depth prediction method.
It does not require large overlapping areas while maintaining structure consistency between cameras.
Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2022-10-05T03:44:34Z) - Pyramid Frequency Network with Spatial Attention Residual Refinement
Module for Monocular Depth Estimation [4.397981844057195]
Deep-learning approaches to depth estimation are rapidly advancing, offering superior performance over existing methods.
In this work, a Pyramid Frequency Network with Spatial Attention Residual Refinement Module is proposed to deal with the weak robustness of existing deep-learning methods.
PFN achieves better visual accuracy than state-of-the-art methods in both indoor and outdoor scenes on Make3D, KITTI depth, and NYUv2 datasets.
arXiv Detail & Related papers (2022-04-05T17:48:26Z) - PINs: Progressive Implicit Networks for Multi-Scale Neural
Representations [68.73195473089324]
We propose a progressive positional encoding, exposing a hierarchical structure to incremental sets of frequency encodings.
Our model accurately reconstructs scenes with wide frequency bands and learns a scene representation at progressive level of detail.
Experiments on several 2D and 3D datasets show improvements in reconstruction accuracy, representational capacity and training speed compared to baselines.
arXiv Detail & Related papers (2022-02-09T20:33:37Z) - DynOcc: Learning Single-View Depth from Dynamic Occlusion Cues [37.837552043766166]
We introduce the first depth dataset DynOcc consisting of dynamic in-the-wild scenes.
Our approach leverages the cues in these dynamic scenes to infer depth relationships between points of selected video frames.
In total our DynOcc dataset contains 22M depth pairs out of 91K frames from a diverse set of videos.
arXiv Detail & Related papers (2021-03-30T22:17:36Z) - DDR-Net: Learning Multi-Stage Multi-View Stereo With Dynamic Depth Range [2.081393321765571]
We propose a Dynamic Depth Range Network ( DDR-Net) to determine the depth range hypotheses dynamically.
In our DDR-Net, we first build an initial depth map at the coarsest resolution of an image across the entire depth range.
We develop a novel loss strategy, which utilizes learned dynamic depth ranges to generate refined depth maps.
arXiv Detail & Related papers (2021-03-26T05:52:38Z) - Guiding Monocular Depth Estimation Using Depth-Attention Volume [38.92495189498365]
We propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments.
Experiments on two popular indoor datasets, NYU-Depth-v2 and ScanNet, show that our method achieves state-of-the-art depth estimation results.
arXiv Detail & Related papers (2020-04-06T15:45:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.