A Confidence-based Iterative Solver of Depths and Surface Normals for
Deep Multi-view Stereo
- URL: http://arxiv.org/abs/2201.07609v1
- Date: Wed, 19 Jan 2022 14:08:45 GMT
- Title: A Confidence-based Iterative Solver of Depths and Surface Normals for
Deep Multi-view Stereo
- Authors: Wang Zhao, Shaohui Liu, Yi Wei, Hengkai Guo, Yong-Jin Liu
- Abstract summary: We introduce a deep multi-view stereo (MVS) system that jointly predicts depths, surface normals and per-view confidence maps.
The key to our approach is a novel solver that iteratively solves for per-view depth map and normal map.
Our proposed solver consistently improves the depth quality over both conventional and deep learning based MVS pipelines.
- Score: 41.527018997251744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a deep multi-view stereo (MVS) system that
jointly predicts depths, surface normals and per-view confidence maps. The key
to our approach is a novel solver that iteratively solves for per-view depth
map and normal map by optimizing an energy potential based on the locally
planar assumption. Specifically, the algorithm updates depth map by propagating
from neighboring pixels with slanted planes, and updates normal map with local
probabilistic plane fitting. Both two steps are monitored by a customized
confidence map. This solver is not only effective as a post-processing tool for
plane-based depth refinement and completion, but also differentiable such that
it can be efficiently integrated into deep learning pipelines. Our multi-view
stereo system employs multiple optimization steps of the solver over the
initial prediction of depths and surface normals. The whole system can be
trained end-to-end, decoupling the challenging problem of matching pixels
within poorly textured regions from the cost-volume based neural network.
Experimental results on ScanNet and RGB-D Scenes V2 demonstrate
state-of-the-art performance of the proposed deep MVS system on multi-view
depth estimation, with our proposed solver consistently improving the depth
quality over both conventional and deep learning based MVS pipelines. Code is
available at https://github.com/thuzhaowang/idn-solver.
Related papers
- ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive
depth range and depth interval [19.28042366225802]
Multi-View Stereo(MVS) is a fundamental problem in geometric computer vision.
We present a novel multi-stage coarse-to-fine framework to achieve adaptive all-pixel depth range and depth interval.
Our model achieves state-of-the-art performance and yields competitive generalization ability.
arXiv Detail & Related papers (2023-08-17T14:52:11Z) - Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth
Approach with Saddle-shaped Depth Cells [23.345139129458122]
We show that different depth geometries have significant performance gaps, even using the same depth prediction error.
We introduce an ideal depth geometry composed of Saddle-Shaped Cells, whose predicted depth map oscillates upward and downward around the ground-truth surface.
Our method also points to a new research direction for considering depth geometry in MVS.
arXiv Detail & Related papers (2023-07-18T11:37:53Z) - Towards the Probabilistic Fusion of Learned Priors into Standard
Pipelines for 3D Reconstruction [31.55322925389011]
We train a deep neural network to predict discrete, nonparametric probability distributions for the depth of each pixel from a single image.
We then fuse this "probability volume" with another probability volume based on the photometric consistency between subsequent frames and the image.
arXiv Detail & Related papers (2022-07-27T11:28:49Z) - Non-parametric Depth Distribution Modelling based Depth Inference for
Multi-view Stereo [43.415242967722804]
Recent cost volume pyramid based deep neural networks have unlocked the potential of efficiently leveraging high-resolution images for depth inference from multi-view stereo.
In general, those approaches assume that the depth of each pixel follows a unimodal distribution.
We propose constructing the cost volume by non-parametric depth distribution modeling to handle pixels with unimodal and multi-modal distributions.
arXiv Detail & Related papers (2022-05-08T05:13:04Z) - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View
Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net.
Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z) - CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth [83.77839773394106]
We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system.
We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction.
We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
arXiv Detail & Related papers (2020-12-18T09:42:54Z) - Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for
3D Reconstruction [12.728154351588053]
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images.
We introduce a coarseto-fine depth inference strategy to achieve high resolution depth.
arXiv Detail & Related papers (2020-11-25T13:34:11Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.