Exploiting Correspondences with All-pairs Correlations for Multi-view
Depth Estimation
- URL: http://arxiv.org/abs/2205.02481v1
- Date: Thu, 5 May 2022 07:38:31 GMT
- Title: Exploiting Correspondences with All-pairs Correlations for Multi-view
Depth Estimation
- Authors: Kai Cheng, Hao Chen, Wei Yin, Guangkai Xu, Xuejin Chen
- Abstract summary: Multi-view depth estimation plays a critical role in reconstructing and understanding the 3D world.
We design a novel iterative multi-view depth estimation framework mimicking the optimization process.
We conduct sufficient experiments on ScanNet, DeMoN, ETH3D, and 7Scenes to demonstrate the superiority of our method.
- Score: 19.647670347925754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-view depth estimation plays a critical role in reconstructing and
understanding the 3D world. Recent learning-based methods have made significant
progress in it. However, multi-view depth estimation is fundamentally a
correspondence-based optimization problem, but previous learning-based methods
mainly rely on predefined depth hypotheses to build correspondence as the cost
volume and implicitly regularize it to fit depth prediction, deviating from the
essence of iterative optimization based on stereo correspondence. Thus, they
suffer unsatisfactory precision and generalization capability. In this paper,
we are the first to explore more general image correlations to establish
correspondences dynamically for depth estimation. We design a novel iterative
multi-view depth estimation framework mimicking the optimization process, which
consists of 1) a correlation volume construction module that models the pixel
similarity between a reference image and source images as all-to-all
correlations; 2) a flow-based depth initialization module that estimates the
depth from the 2D optical flow; 3) a novel correlation-guided depth refinement
module that reprojects points in different views to effectively fetch relevant
correlations for further fusion and integrate the fused correlation for
iterative depth update. Without predefined depth hypotheses, the fused
correlations establish multi-view correspondence in an efficient way and guide
the depth refinement heuristically. We conduct sufficient experiments on
ScanNet, DeMoN, ETH3D, and 7Scenes to demonstrate the superiority of our method
on multi-view depth estimation and its best generalization ability.
Related papers
- DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
In this paper, we present DepthSplat to connect Gaussian splatting and depth estimation.
We first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features.
We also show that Gaussian splatting can serve as an unsupervised pre-training objective.
arXiv Detail & Related papers (2024-10-17T17:59:58Z) - DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation [17.99904937160487]
DCPI-Depth is a framework that incorporates all these innovative components and couples two bidirectional and collaborative streams.
It achieves state-of-the-art performance and generalizability across multiple public datasets, outperforming all existing prior arts.
arXiv Detail & Related papers (2024-05-27T08:55:17Z) - FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen
Indoor Scene [57.26600120397529]
It has long been an ill-posed problem to predict absolute depth maps from single images in real (unseen) indoor scenes.
We develop a focal-and-scale depth estimation model to well learn absolute depth maps from single images in unseen indoor scenes.
arXiv Detail & Related papers (2023-07-27T04:49:36Z) - Depth Refinement for Improved Stereo Reconstruction [13.941756438712382]
Current techniques for depth estimation from stereoscopic images still suffer from a built-in drawback.
A simple analysis reveals that the depth error is quadratically proportional to the object's distance.
We propose a simple but effective method that uses a refinement network for depth estimation.
arXiv Detail & Related papers (2021-12-15T12:21:08Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z) - Depth-conditioned Dynamic Message Propagation for Monocular 3D Object
Detection [86.25022248968908]
We learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection.
We show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset.
arXiv Detail & Related papers (2021-03-30T16:20:24Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning.
Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples.
We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.