IterMVS: Iterative Probability Estimation for Efficient Multi-View
Stereo
- URL: http://arxiv.org/abs/2112.05126v1
- Date: Thu, 9 Dec 2021 18:58:02 GMT
- Title: IterMVS: Iterative Probability Estimation for Efficient Multi-View
Stereo
- Authors: Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Marc Pollefeys
- Abstract summary: IterMVS is a new data-driven method for high-resolution multi-view stereo.
We propose a novel GRU-based estimator that encodes pixel-wise probability distributions of depth in its hidden state.
We verify the efficiency and effectiveness of our method on DTU, Tanks&Temples and ETH3D.
- Score: 71.84742490020611
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present IterMVS, a new data-driven method for high-resolution multi-view
stereo. We propose a novel GRU-based estimator that encodes pixel-wise
probability distributions of depth in its hidden state. Ingesting multi-scale
matching information, our model refines these distributions over multiple
iterations and infers depth and confidence. To extract the depth maps, we
combine traditional classification and regression in a novel manner. We verify
the efficiency and effectiveness of our method on DTU, Tanks&Temples and ETH3D.
While being the most efficient method in both memory and run-time, our model
achieves competitive performance on DTU and better generalization ability on
Tanks&Temples as well as ETH3D than most state-of-the-art methods. Code is
available at https://github.com/FangjinhuaWang/IterMVS.
Related papers
- A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical
Refinement and EM optimization [6.886220026399106]
We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas.
We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes.
We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
arXiv Detail & Related papers (2024-01-12T05:25:57Z) - MP-MVS: Multi-Scale Windows PatchMatch and Planar Prior Multi-View
Stereo [7.130834755320434]
We propose a resilient and effective multi-view stereo approach (MP-MVS)
We design a multi-scale windows PatchMatch (mPM) to obtain reliable depth of untextured areas.
In contrast with other multi-scale approaches, which is faster and can be easily extended to PatchMatch-based MVS approaches.
arXiv Detail & Related papers (2023-09-23T07:30:42Z) - Curvature-guided dynamic scale networks for Multi-view Stereo [10.667165962654996]
This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation.
We present a dynamic scale feature extraction network, namely, CDSFNet.
It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface.
arXiv Detail & Related papers (2021-12-11T14:41:05Z) - PatchmatchNet: Learned Multi-View Patchmatch Stereo [70.14789588576438]
We present PatchmatchNet, a novel and learnable cascade formulation of Patchmatch for high-resolution multi-view stereo.
With high speed and low memory requirement, PatchmatchNet can process higher resolution imagery and is more suited to run on resource limited devices than competitors that employ 3D cost volume regularization.
arXiv Detail & Related papers (2020-12-02T18:59:02Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation
and Gauss-Newton Refinement [46.8514966956438]
This paper presents a Fast-MVSNet, a novel sparse-to-dense coarse-to-fine framework, for fast and accurate depth estimation in MVS.
Specifically, in our Fast-MVSNet, we first construct a sparse cost volume for learning a sparse and high-resolution depth map.
At last, a simple but efficient Gauss-Newton layer is proposed to further optimize the depth map.
arXiv Detail & Related papers (2020-03-29T13:31:00Z) - Towards Reading Beyond Faces for Sparsity-Aware 4D Affect Recognition [55.15661254072032]
We present a sparsity-aware deep network for automatic 4D facial expression recognition (FER)
We first propose a novel augmentation method to combat the data limitation problem for deep learning.
We then present a sparsity-aware deep network to compute the sparse representations of convolutional features over multi-views.
arXiv Detail & Related papers (2020-02-08T13:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.