ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive
depth range and depth interval
- URL: http://arxiv.org/abs/2308.09022v1
- Date: Thu, 17 Aug 2023 14:52:11 GMT
- Title: ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive
depth range and depth interval
- Authors: Song Zhang, Wenjia Xu, Zhiwei Wei, Lili Zhang, Yang Wang, Junyi Liu
- Abstract summary: Multi-View Stereo(MVS) is a fundamental problem in geometric computer vision.
We present a novel multi-stage coarse-to-fine framework to achieve adaptive all-pixel depth range and depth interval.
Our model achieves state-of-the-art performance and yields competitive generalization ability.
- Score: 19.28042366225802
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-View Stereo~(MVS) is a fundamental problem in geometric computer vision
which aims to reconstruct a scene using multi-view images with known camera
parameters. However, the mainstream approaches represent the scene with a fixed
all-pixel depth range and equal depth interval partition, which will result in
inadequate utilization of depth planes and imprecise depth estimation. In this
paper, we present a novel multi-stage coarse-to-fine framework to achieve
adaptive all-pixel depth range and depth interval. We predict a coarse depth
map in the first stage, then an Adaptive Depth Range Prediction module is
proposed in the second stage to zoom in the scene by leveraging the reference
image and the obtained depth map in the first stage and predict a more accurate
all-pixel depth range for the following stages. In the third and fourth stages,
we propose an Adaptive Depth Interval Adjustment module to achieve adaptive
variable interval partition for pixel-wise depth range. The depth interval
distribution in this module is normalized by Z-score, which can allocate dense
depth hypothesis planes around the potential ground truth depth value and vice
versa to achieve more accurate depth estimation. Extensive experiments on four
widely used benchmark datasets~(DTU, TnT, BlendedMVS, ETH 3D) demonstrate that
our model achieves state-of-the-art performance and yields competitive
generalization ability. Particularly, our method achieves the highest Acc and
Overall on the DTU dataset, while attaining the highest Recall and
$F_{1}$-score on the Tanks and Temples intermediate and advanced dataset.
Moreover, our method also achieves the lowest $e_{1}$ and $e_{3}$ on the
BlendedMVS dataset and the highest Acc and $F_{1}$-score on the ETH 3D dataset,
surpassing all listed methods.Project website:
https://github.com/zs670980918/ARAI-MVSNet
Related papers
- Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth
Approach with Saddle-shaped Depth Cells [23.345139129458122]
We show that different depth geometries have significant performance gaps, even using the same depth prediction error.
We introduce an ideal depth geometry composed of Saddle-Shaped Cells, whose predicted depth map oscillates upward and downward around the ground-truth surface.
Our method also points to a new research direction for considering depth geometry in MVS.
arXiv Detail & Related papers (2023-07-18T11:37:53Z) - Depthformer : Multiscale Vision Transformer For Monocular Depth
Estimation With Local Global Information Fusion [6.491470878214977]
This paper benchmarks various transformer-based models for the depth estimation task on an indoor NYUV2 dataset and an outdoor KITTI dataset.
We propose a novel attention-based architecture, Depthformer for monocular depth estimation.
Our proposed method improves the state-of-the-art by 3.3%, and 3.3% respectively in terms of Root Mean Squared Error (RMSE)
arXiv Detail & Related papers (2022-07-10T20:49:11Z) - P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior [133.76192155312182]
We propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth.
An extensive evaluation of our method shows that we set the new state of the art in supervised monocular depth estimation.
arXiv Detail & Related papers (2022-04-05T10:03:52Z) - A Confidence-based Iterative Solver of Depths and Surface Normals for
Deep Multi-view Stereo [41.527018997251744]
We introduce a deep multi-view stereo (MVS) system that jointly predicts depths, surface normals and per-view confidence maps.
The key to our approach is a novel solver that iteratively solves for per-view depth map and normal map.
Our proposed solver consistently improves the depth quality over both conventional and deep learning based MVS pipelines.
arXiv Detail & Related papers (2022-01-19T14:08:45Z) - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - DDR-Net: Learning Multi-Stage Multi-View Stereo With Dynamic Depth Range [2.081393321765571]
We propose a Dynamic Depth Range Network ( DDR-Net) to determine the depth range hypotheses dynamically.
In our DDR-Net, we first build an initial depth map at the coarsest resolution of an image across the entire depth range.
We develop a novel loss strategy, which utilizes learned dynamic depth ranges to generate refined depth maps.
arXiv Detail & Related papers (2021-03-26T05:52:38Z) - PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View
Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net.
Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z) - Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for
3D Reconstruction [12.728154351588053]
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images.
We introduce a coarseto-fine depth inference strategy to achieve high resolution depth.
arXiv Detail & Related papers (2020-11-25T13:34:11Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.