Related papers: A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

URL: http://arxiv.org/abs/2201.07609v1
Date: Wed, 19 Jan 2022 14:08:45 GMT
Title: A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo
Authors: Wang Zhao, Shaohui Liu, Yi Wei, Hengkai Guo, Yong-Jin Liu
Abstract summary: We introduce a deep multi-view stereo (MVS) system that jointly predicts depths, surface normals and per-view confidence maps. The key to our approach is a novel solver that iteratively solves for per-view depth map and normal map. Our proposed solver consistently improves the depth quality over both conventional and deep learning based MVS pipelines.
Score: 41.527018997251744
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce a deep multi-view stereo (MVS) system that jointly predicts depths, surface normals and per-view confidence maps. The key to our approach is a novel solver that iteratively solves for per-view depth map and normal map by optimizing an energy potential based on the locally planar assumption. Specifically, the algorithm updates depth map by propagating from neighboring pixels with slanted planes, and updates normal map with local probabilistic plane fitting. Both two steps are monitored by a customized confidence map. This solver is not only effective as a post-processing tool for plane-based depth refinement and completion, but also differentiable such that it can be efficiently integrated into deep learning pipelines. Our multi-view stereo system employs multiple optimization steps of the solver over the initial prediction of depths and surface normals. The whole system can be trained end-to-end, decoupling the challenging problem of matching pixels within poorly textured regions from the cost-volume based neural network. Experimental results on ScanNet and RGB-D Scenes V2 demonstrate state-of-the-art performance of the proposed deep MVS system on multi-view depth estimation, with our proposed solver consistently improving the depth quality over both conventional and deep learning based MVS pipelines. Code is available at https://github.com/thuzhaowang/idn-solver.

Related papers

ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive depth range and depth interval [19.28042366225802]
Multi-View Stereo(MVS) is a fundamental problem in geometric computer vision. We present a novel multi-stage coarse-to-fine framework to achieve adaptive all-pixel depth range and depth interval. Our model achieves state-of-the-art performance and yields competitive generalization ability.
arXiv Detail & Related papers (2023-08-17T14:52:11Z)
Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth Approach with Saddle-shaped Depth Cells [23.345139129458122]
We show that different depth geometries have significant performance gaps, even using the same depth prediction error. We introduce an ideal depth geometry composed of Saddle-Shaped Cells, whose predicted depth map oscillates upward and downward around the ground-truth surface. Our method also points to a new research direction for considering depth geometry in MVS.
arXiv Detail & Related papers (2023-07-18T11:37:53Z)
Towards the Probabilistic Fusion of Learned Priors into Standard Pipelines for 3D Reconstruction [31.55322925389011]
We train a deep neural network to predict discrete, nonparametric probability distributions for the depth of each pixel from a single image. We then fuse this "probability volume" with another probability volume based on the photometric consistency between subsequent frames and the image.
arXiv Detail & Related papers (2022-07-27T11:28:49Z)
Non-parametric Depth Distribution Modelling based Depth Inference for Multi-view Stereo [43.415242967722804]
Recent cost volume pyramid based deep neural networks have unlocked the potential of efficiently leveraging high-resolution images for depth inference from multi-view stereo. In general, those approaches assume that the depth of each pixel follows a unimodal distribution. We propose constructing the cost volume by non-parametric depth distribution modeling to handle pixels with unimodal and multi-modal distributions.
arXiv Detail & Related papers (2022-05-08T05:13:04Z)
3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions. We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z)
VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results. Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume. In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z)
PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net. Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z)
CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth [83.77839773394106]
We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system. We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction. We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
arXiv Detail & Related papers (2020-12-18T09:42:54Z)
Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction [12.728154351588053]
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images. We introduce a coarseto-fine depth inference strategy to achieve high resolution depth.
arXiv Detail & Related papers (2020-11-25T13:34:11Z)
OmniSLAM: Omnidirectional Localization and Dense Mapping for Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras. For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation. We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.