PatchmatchNet: Learned Multi-View Patchmatch Stereo
- URL: http://arxiv.org/abs/2012.01411v1
- Date: Wed, 2 Dec 2020 18:59:02 GMT
- Title: PatchmatchNet: Learned Multi-View Patchmatch Stereo
- Authors: Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Pablo Speciale,
Marc Pollefeys
- Abstract summary: We present PatchmatchNet, a novel and learnable cascade formulation of Patchmatch for high-resolution multi-view stereo.
With high speed and low memory requirement, PatchmatchNet can process higher resolution imagery and is more suited to run on resource limited devices than competitors that employ 3D cost volume regularization.
- Score: 70.14789588576438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present PatchmatchNet, a novel and learnable cascade formulation of
Patchmatch for high-resolution multi-view stereo. With high computation speed
and low memory requirement, PatchmatchNet can process higher resolution imagery
and is more suited to run on resource limited devices than competitors that
employ 3D cost volume regularization. For the first time we introduce an
iterative multi-scale Patchmatch in an end-to-end trainable architecture and
improve the Patchmatch core algorithm with a novel and learned adaptive
propagation and evaluation scheme for each iteration. Extensive experiments
show a very competitive performance and generalization for our method on DTU,
Tanks & Temples and ETH3D, but at a significantly higher efficiency than all
existing top-performing models: at least two and a half times faster than
state-of-the-art methods with twice less memory usage.
Related papers
- Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks [93.18404922542702]
We present a novel video generative model designed to address long-term spatial and temporal dependencies.
Our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks.
Our model synthesizes high-fidelity video clips at a resolution of $256times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps.
arXiv Detail & Related papers (2024-01-11T16:48:44Z) - MP-MVS: Multi-Scale Windows PatchMatch and Planar Prior Multi-View
Stereo [7.130834755320434]
We propose a resilient and effective multi-view stereo approach (MP-MVS)
We design a multi-scale windows PatchMatch (mPM) to obtain reliable depth of untextured areas.
In contrast with other multi-scale approaches, which is faster and can be easily extended to PatchMatch-based MVS approaches.
arXiv Detail & Related papers (2023-09-23T07:30:42Z) - Deep PatchMatch MVS with Learned Patch Coplanarity, Geometric
Consistency and Adaptive Pixel Sampling [19.412014102866507]
We build on learning-based approaches to improve photometric scores by learning patch coplanarity and encourage geometric consistency.
We propose an adaptive pixel sampling strategy for candidate propagation that reduces memory to enable training on larger resolution with more views and a larger encoder.
arXiv Detail & Related papers (2022-10-14T07:29:03Z) - Curvature-guided dynamic scale networks for Multi-view Stereo [10.667165962654996]
This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation.
We present a dynamic scale feature extraction network, namely, CDSFNet.
It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface.
arXiv Detail & Related papers (2021-12-11T14:41:05Z) - IterMVS: Iterative Probability Estimation for Efficient Multi-View
Stereo [71.84742490020611]
IterMVS is a new data-driven method for high-resolution multi-view stereo.
We propose a novel GRU-based estimator that encodes pixel-wise probability distributions of depth in its hidden state.
We verify the efficiency and effectiveness of our method on DTU, Tanks&Temples and ETH3D.
arXiv Detail & Related papers (2021-12-09T18:58:02Z) - PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility [23.427619869594437]
We propose an end-to-end trainable PatchMatch-based MVS approach that combines advantages of trainable costs and regularizations with pixelwise estimates.
We evaluate our method on widely used MVS benchmarks, ETH3D and Tanks and Temples (TnT)
arXiv Detail & Related papers (2021-08-19T23:14:48Z) - Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
Transformers [115.90778814368703]
Our objective is language-based search of large-scale image and video datasets.
For this task, the approach that consists of independently mapping text and vision to a joint embedding space, a.k.a. dual encoders, is attractive as retrieval scales.
An alternative approach of using vision-text transformers with cross-attention gives considerable improvements in accuracy over the joint embeddings.
arXiv Detail & Related papers (2021-03-30T17:57:08Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.