Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation
and Gauss-Newton Refinement
- URL: http://arxiv.org/abs/2003.13017v1
- Date: Sun, 29 Mar 2020 13:31:00 GMT
- Title: Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation
and Gauss-Newton Refinement
- Authors: Zehao Yu, Shenghua Gao
- Abstract summary: This paper presents a Fast-MVSNet, a novel sparse-to-dense coarse-to-fine framework, for fast and accurate depth estimation in MVS.
Specifically, in our Fast-MVSNet, we first construct a sparse cost volume for learning a sparse and high-resolution depth map.
At last, a simple but efficient Gauss-Newton layer is proposed to further optimize the depth map.
- Score: 46.8514966956438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Almost all previous deep learning-based multi-view stereo (MVS) approaches
focus on improving reconstruction quality. Besides quality, efficiency is also
a desirable feature for MVS in real scenarios. Towards this end, this paper
presents a Fast-MVSNet, a novel sparse-to-dense coarse-to-fine framework, for
fast and accurate depth estimation in MVS. Specifically, in our Fast-MVSNet, we
first construct a sparse cost volume for learning a sparse and high-resolution
depth map. Then we leverage a small-scale convolutional neural network to
encode the depth dependencies for pixels within a local region to densify the
sparse high-resolution depth map. At last, a simple but efficient Gauss-Newton
layer is proposed to further optimize the depth map. On one hand, the
high-resolution depth map, the data-adaptive propagation method and the
Gauss-Newton layer jointly guarantee the effectiveness of our method. On the
other hand, all modules in our Fast-MVSNet are lightweight and thus guarantee
the efficiency of our approach. Besides, our approach is also memory-friendly
because of the sparse depth representation. Extensive experimental results show
that our method is 5$\times$ and 14$\times$ faster than Point-MVSNet and
R-MVSNet, respectively, while achieving comparable or even better results on
the challenging Tanks and Temples dataset as well as the DTU dataset. Code is
available at https://github.com/svip-lab/FastMVSNet.
Related papers
- RayMVSNet++: Learning Ray-based 1D Implicit Fields for Accurate
Multi-View Stereo [21.209964556493368]
RayMVSNet learns sequential prediction of a 1D implicit field along each camera ray with the zero-crossing point indicating scene depth.
RayMVSNet++ achieves state-of-the-art performance on the ScanNet dataset.
arXiv Detail & Related papers (2023-07-16T02:10:47Z) - SimpleMapping: Real-Time Visual-Inertial Dense Mapping with Deep
Multi-View Stereo [13.535871843518953]
We present a real-time visual-inertial dense mapping method with high quality using only monocular images and IMU readings.
We propose a sparse point aided stereo neural network (SPA-MVSNet) that can effectively leverage the informative but noisy sparse points from the VIO system.
Our proposed dense mapping system achieves a 39.7% improvement in F-score over existing systems when evaluated on the challenging scenarios of the EuRoC dataset.
arXiv Detail & Related papers (2023-06-14T17:28:45Z) - SFNet: Faster and Accurate Semantic Segmentation via Semantic Flow [88.97790684009979]
A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation.
We propose a Flow Alignment Module (FAM) to learn textitSemantic Flow between feature maps of adjacent levels.
We also present a novel Gated Dual Flow Alignment Module to directly align high-resolution feature maps and low-resolution feature maps.
arXiv Detail & Related papers (2022-07-10T08:25:47Z) - A Confidence-based Iterative Solver of Depths and Surface Normals for
Deep Multi-view Stereo [41.527018997251744]
We introduce a deep multi-view stereo (MVS) system that jointly predicts depths, surface normals and per-view confidence maps.
The key to our approach is a novel solver that iteratively solves for per-view depth map and normal map.
Our proposed solver consistently improves the depth quality over both conventional and deep learning based MVS pipelines.
arXiv Detail & Related papers (2022-01-19T14:08:45Z) - Curvature-guided dynamic scale networks for Multi-view Stereo [10.667165962654996]
This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation.
We present a dynamic scale feature extraction network, namely, CDSFNet.
It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface.
arXiv Detail & Related papers (2021-12-11T14:41:05Z) - IterMVS: Iterative Probability Estimation for Efficient Multi-View
Stereo [71.84742490020611]
IterMVS is a new data-driven method for high-resolution multi-view stereo.
We propose a novel GRU-based estimator that encodes pixel-wise probability distributions of depth in its hidden state.
We verify the efficiency and effectiveness of our method on DTU, Tanks&Temples and ETH3D.
arXiv Detail & Related papers (2021-12-09T18:58:02Z) - CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth [83.77839773394106]
We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system.
We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction.
We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
arXiv Detail & Related papers (2020-12-18T09:42:54Z) - Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency
Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map.
Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.