DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation
- URL: http://arxiv.org/abs/2405.16960v1
- Date: Mon, 27 May 2024 08:55:17 GMT
- Title: DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation
- Authors: Mengtan Zhang, Yi Feng, Qijun Chen, Rui Fan,
- Abstract summary: DCPI-Depth is a framework that incorporates all these innovative components and couples two bidirectional and collaborative streams.
It achieves state-of-the-art performance and generalizability across multiple public datasets, outperforming all existing prior arts.
- Score: 17.99904937160487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been a recent surge of interest in learning to perceive depth from monocular videos in an unsupervised fashion. A key challenge in this field is achieving robust and accurate depth estimation in challenging scenarios, particularly in regions with weak textures or where dynamic objects are present. This study makes three major contributions by delving deeply into dense correspondence priors to provide existing frameworks with explicit geometric constraints. The first novelty is a contextual-geometric depth consistency loss, which employs depth maps triangulated from dense correspondences based on estimated ego-motion to guide the learning of depth perception from contextual information, since explicitly triangulated depth maps capture accurate relative distances among pixels. The second novelty arises from the observation that there exists an explicit, deducible relationship between optical flow divergence and depth gradient. A differential property correlation loss is, therefore, designed to refine depth estimation with a specific emphasis on local variations. The third novelty is a bidirectional stream co-adjustment strategy that enhances the interaction between rigid and optical flows, encouraging the former towards more accurate correspondence and making the latter more adaptable across various scenarios under the static scene hypotheses. DCPI-Depth, a framework that incorporates all these innovative components and couples two bidirectional and collaborative streams, achieves state-of-the-art performance and generalizability across multiple public datasets, outperforming all existing prior arts. Specifically, it demonstrates accurate depth estimation in texture-less and dynamic regions, and shows more reasonable smoothness.
Related papers
- Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z) - Depth-aware Volume Attention for Texture-less Stereo Matching [67.46404479356896]
We propose a lightweight volume refinement scheme to tackle the texture deterioration in practical outdoor scenarios.
We introduce a depth volume supervised by the ground-truth depth map, capturing the relative hierarchy of image texture.
Local fine structure and context are emphasized to mitigate ambiguity and redundancy during volume aggregation.
arXiv Detail & Related papers (2024-02-14T04:07:44Z) - NDDepth: Normal-Distance Assisted Monocular Depth Estimation [22.37113584192617]
We propose a novel physics (geometry)-driven deep learning framework for monocular depth estimation.
We introduce a new normal-distance head that outputs pixel-level surface normal and plane-to-origin distance for deriving depth at each position.
We develop an effective contrastive iterative refinement module that refines depth in a complementary manner according to the depth uncertainty.
arXiv Detail & Related papers (2023-09-19T13:05:57Z) - Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative
Convolution Network [80.19054069988559]
We find that self-supervised monocular depth estimation shows a direction sensitivity and environmental dependency.
We propose a new Direction-aware Cumulative Convolution Network (DaCCN), which improves the depth representation in two aspects.
Experiments show that our method achieves significant improvements on three widely used benchmarks.
arXiv Detail & Related papers (2023-08-10T14:32:18Z) - On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation [56.97699793236174]
We study two kinds of robust cross-view consistency in this paper.
We exploit the temporal coherence in both depth feature space and 3D voxel space for self-supervised monocular depth estimation.
Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2022-09-19T03:46:13Z) - Exploiting Correspondences with All-pairs Correlations for Multi-view
Depth Estimation [19.647670347925754]
Multi-view depth estimation plays a critical role in reconstructing and understanding the 3D world.
We design a novel iterative multi-view depth estimation framework mimicking the optimization process.
We conduct sufficient experiments on ScanNet, DeMoN, ETH3D, and 7Scenes to demonstrate the superiority of our method.
arXiv Detail & Related papers (2022-05-05T07:38:31Z) - Self-Guided Instance-Aware Network for Depth Completion and Enhancement [6.319531161477912]
Existing methods directly interpolate the missing depth measurements based on pixel-wise image content and the corresponding neighboring depth values.
We propose a novel self-guided instance-aware network (SG-IANet) that utilize self-guided mechanism to extract instance-level features that is needed for depth restoration.
arXiv Detail & Related papers (2021-05-25T19:41:38Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z) - Boundary-induced and scene-aggregated network for monocular depth
prediction [20.358133522462513]
We propose the Boundary-induced and Scene-aggregated network (BS-Net) to predict the dense depth of a single RGB image.
Several experimental results on the NYUD v2 dataset and xffthe iBims-1 dataset illustrate the state-of-the-art performance of the proposed approach.
arXiv Detail & Related papers (2021-02-26T01:43:17Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.