Non-parametric Depth Distribution Modelling based Depth Inference for
Multi-view Stereo
- URL: http://arxiv.org/abs/2205.03783v1
- Date: Sun, 8 May 2022 05:13:04 GMT
- Title: Non-parametric Depth Distribution Modelling based Depth Inference for
Multi-view Stereo
- Authors: Jiayu Yang, Jose M. Alvarez, Miaomiao Liu
- Abstract summary: Recent cost volume pyramid based deep neural networks have unlocked the potential of efficiently leveraging high-resolution images for depth inference from multi-view stereo.
In general, those approaches assume that the depth of each pixel follows a unimodal distribution.
We propose constructing the cost volume by non-parametric depth distribution modeling to handle pixels with unimodal and multi-modal distributions.
- Score: 43.415242967722804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent cost volume pyramid based deep neural networks have unlocked the
potential of efficiently leveraging high-resolution images for depth inference
from multi-view stereo. In general, those approaches assume that the depth of
each pixel follows a unimodal distribution. Boundary pixels usually follow a
multi-modal distribution as they represent different depths; Therefore, the
assumption results in an erroneous depth prediction at the coarser level of the
cost volume pyramid and can not be corrected in the refinement levels leading
to wrong depth predictions. In contrast, we propose constructing the cost
volume by non-parametric depth distribution modeling to handle pixels with
unimodal and multi-modal distributions. Our approach outputs multiple depth
hypotheses at the coarser level to avoid errors in the early stage. As we
perform local search around these multiple hypotheses in subsequent levels, our
approach does not maintain the rigid depth spatial ordering and, therefore, we
introduce a sparse cost aggregation network to derive information within each
volume. We evaluate our approach extensively on two benchmark datasets: DTU and
Tanks & Temples. Our experimental results show that our model outperforms
existing methods by a large margin and achieves superior performance on
boundary regions. Code is available at https://github.com/NVlabs/NP-CVP-MVSNet
Related papers
- Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - Progressive Depth Decoupling and Modulating for Flexible Depth Completion [28.693100885012008]
Image-guided depth completion aims at generating a dense depth map from sparse LiDAR data and RGB image.
Recent methods have shown promising performance by reformulating it as a classification problem with two sub-tasks: depth discretization and probability prediction.
We propose a progressive depth decoupling and modulating network, which incrementally decouples the depth range into bins and adaptively generates multi-scale dense depth maps.
arXiv Detail & Related papers (2024-05-15T13:45:33Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - DiffusionDepth: Diffusion Denoising Approach for Monocular Depth
Estimation [23.22005119986485]
DiffusionDepth is a new approach that reformulates monocular depth estimation as a denoising diffusion process.
It learns an iterative denoising process to denoise' random depth distribution into a depth map with the guidance of monocular visual conditions.
Experimental results on KITTI and NYU-Depth-V2 datasets suggest that a simple yet efficient diffusion approach could reach state-of-the-art performance in both indoor and outdoor scenarios with acceptable inference time.
arXiv Detail & Related papers (2023-03-09T03:48:24Z) - A Confidence-based Iterative Solver of Depths and Surface Normals for
Deep Multi-view Stereo [41.527018997251744]
We introduce a deep multi-view stereo (MVS) system that jointly predicts depths, surface normals and per-view confidence maps.
The key to our approach is a novel solver that iteratively solves for per-view depth map and normal map.
Our proposed solver consistently improves the depth quality over both conventional and deep learning based MVS pipelines.
arXiv Detail & Related papers (2022-01-19T14:08:45Z) - DDR-Net: Learning Multi-Stage Multi-View Stereo With Dynamic Depth Range [2.081393321765571]
We propose a Dynamic Depth Range Network ( DDR-Net) to determine the depth range hypotheses dynamically.
In our DDR-Net, we first build an initial depth map at the coarsest resolution of an image across the entire depth range.
We develop a novel loss strategy, which utilizes learned dynamic depth ranges to generate refined depth maps.
arXiv Detail & Related papers (2021-03-26T05:52:38Z) - PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View
Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net.
Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z) - Direct Depth Learning Network for Stereo Matching [79.3665881702387]
A novel Direct Depth Learning Network (DDL-Net) is designed for stereo matching.
DDL-Net consists of two stages: the Coarse Depth Estimation stage and the Adaptive-Grained Depth Refinement stage.
We show that DDL-Net achieves an average improvement of 25% on the SceneFlow dataset and $12%$ on the DrivingStereo dataset.
arXiv Detail & Related papers (2020-12-10T10:33:57Z) - Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for
3D Reconstruction [12.728154351588053]
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images.
We introduce a coarseto-fine depth inference strategy to achieve high resolution depth.
arXiv Detail & Related papers (2020-11-25T13:34:11Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.