PatchmatchNet: Learned Multi-View Patchmatch Stereo
- URL: http://arxiv.org/abs/2012.01411v1
- Date: Wed, 2 Dec 2020 18:59:02 GMT
- Title: PatchmatchNet: Learned Multi-View Patchmatch Stereo
- Authors: Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Pablo Speciale,
Marc Pollefeys
- Abstract summary: We present PatchmatchNet, a novel and learnable cascade formulation of Patchmatch for high-resolution multi-view stereo.
With high speed and low memory requirement, PatchmatchNet can process higher resolution imagery and is more suited to run on resource limited devices than competitors that employ 3D cost volume regularization.
- Score: 70.14789588576438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present PatchmatchNet, a novel and learnable cascade formulation of
Patchmatch for high-resolution multi-view stereo. With high computation speed
and low memory requirement, PatchmatchNet can process higher resolution imagery
and is more suited to run on resource limited devices than competitors that
employ 3D cost volume regularization. For the first time we introduce an
iterative multi-scale Patchmatch in an end-to-end trainable architecture and
improve the Patchmatch core algorithm with a novel and learned adaptive
propagation and evaluation scheme for each iteration. Extensive experiments
show a very competitive performance and generalization for our method on DTU,
Tanks & Temples and ETH3D, but at a significantly higher efficiency than all
existing top-performing models: at least two and a half times faster than
state-of-the-art methods with twice less memory usage.
Related papers
- MP-MVS: Multi-Scale Windows PatchMatch and Planar Prior Multi-View
Stereo [7.130834755320434]
We propose a resilient and effective multi-view stereo approach (MP-MVS)
We design a multi-scale windows PatchMatch (mPM) to obtain reliable depth of untextured areas.
In contrast with other multi-scale approaches, which is faster and can be easily extended to PatchMatch-based MVS approaches.
arXiv Detail & Related papers (2023-09-23T07:30:42Z) - Deep PatchMatch MVS with Learned Patch Coplanarity, Geometric
Consistency and Adaptive Pixel Sampling [19.412014102866507]
We build on learning-based approaches to improve photometric scores by learning patch coplanarity and encourage geometric consistency.
We propose an adaptive pixel sampling strategy for candidate propagation that reduces memory to enable training on larger resolution with more views and a larger encoder.
arXiv Detail & Related papers (2022-10-14T07:29:03Z) - Curvature-guided dynamic scale networks for Multi-view Stereo [10.667165962654996]
This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation.
We present a dynamic scale feature extraction network, namely, CDSFNet.
It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface.
arXiv Detail & Related papers (2021-12-11T14:41:05Z) - IterMVS: Iterative Probability Estimation for Efficient Multi-View
Stereo [71.84742490020611]
IterMVS is a new data-driven method for high-resolution multi-view stereo.
We propose a novel GRU-based estimator that encodes pixel-wise probability distributions of depth in its hidden state.
We verify the efficiency and effectiveness of our method on DTU, Tanks&Temples and ETH3D.
arXiv Detail & Related papers (2021-12-09T18:58:02Z) - PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility [23.427619869594437]
We propose an end-to-end trainable PatchMatch-based MVS approach that combines advantages of trainable costs and regularizations with pixelwise estimates.
We evaluate our method on widely used MVS benchmarks, ETH3D and Tanks and Temples (TnT)
arXiv Detail & Related papers (2021-08-19T23:14:48Z) - Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
Transformers [115.90778814368703]
Our objective is language-based search of large-scale image and video datasets.
For this task, the approach that consists of independently mapping text and vision to a joint embedding space, a.k.a. dual encoders, is attractive as retrieval scales.
An alternative approach of using vision-text transformers with cross-attention gives considerable improvements in accuracy over the joint embeddings.
arXiv Detail & Related papers (2021-03-30T17:57:08Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Towards Fast, Accurate and Stable 3D Dense Face Alignment [73.01620081047336]
We propose a novel regression framework named 3DDFA-V2 which makes a balance among speed, accuracy and stability.
We present a virtual synthesis method to transform one still image to a short-video which incorporates in-plane and out-of-plane face moving.
arXiv Detail & Related papers (2020-09-21T15:37:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.