Multiview Stereo with Cascaded Epipolar RAFT
- URL: http://arxiv.org/abs/2205.04502v1
- Date: Mon, 9 May 2022 18:17:05 GMT
- Title: Multiview Stereo with Cascaded Epipolar RAFT
- Authors: Zeyu Ma, Zachary Teed, Jia Deng
- Abstract summary: We address multiview stereo (MVS), an important 3D vision task that reconstructs a 3D model such as a dense point cloud from multiple calibrated images.
We propose CER-MVS, a new approach based on the RAFT (Recurrent All-Pairs Field Transforms) architecture developed for optical flow. CER-MVS introduces five new changes to RAFT: epipolar cost volumes, cost volume cascading, multiview fusion of cost volumes, dynamic supervision, and multiresolution fusion of depth maps.
- Score: 73.7619703879639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address multiview stereo (MVS), an important 3D vision task that
reconstructs a 3D model such as a dense point cloud from multiple calibrated
images. We propose CER-MVS (Cascaded Epipolar RAFT Multiview Stereo), a new
approach based on the RAFT (Recurrent All-Pairs Field Transforms) architecture
developed for optical flow. CER-MVS introduces five new changes to RAFT:
epipolar cost volumes, cost volume cascading, multiview fusion of cost volumes,
dynamic supervision, and multiresolution fusion of depth maps. CER-MVS is
significantly different from prior work in multiview stereo. Unlike prior work,
which operates by updating a 3D cost volume, CER-MVS operates by updating a
disparity field. Furthermore, we propose an adaptive thresholding method to
balance the completeness and accuracy of the reconstructed point clouds.
Experiments show that our approach achieves competitive performance on DTU (the
second best among known results) and state-of-the-art performance on the
Tanks-and-Temples benchmark (both the intermediate and advanced set). Code is
available at https://github.com/princeton-vl/CER-MVS
Related papers
- A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View
Stereo [60.75684891484619]
We introduce MVSFormer++, a method that maximizes the inherent characteristics of attention to enhance various components of the MVS pipeline.
We employ different attention mechanisms for the feature encoder and cost volume regularization, focusing on feature and spatial aggregations respectively.
Comprehensive experiments on DTU, Tanks-and-Temples, BlendedMVS, and ETH3D validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-01-22T03:22:49Z) - MP-MVS: Multi-Scale Windows PatchMatch and Planar Prior Multi-View
Stereo [7.130834755320434]
We propose a resilient and effective multi-view stereo approach (MP-MVS)
We design a multi-scale windows PatchMatch (mPM) to obtain reliable depth of untextured areas.
In contrast with other multi-scale approaches, which is faster and can be easily extended to PatchMatch-based MVS approaches.
arXiv Detail & Related papers (2023-09-23T07:30:42Z) - WT-MVSNet: Window-based Transformers for Multi-view Stereo [12.25150988628149]
We introduce a Window-based Epipolar Transformer (WET) which reduces matching redundancy by using epipolar constraints.
A second Shifted WT is employed for aggregating global information within cost volume.
We present a novel Cost Transformer (CT) to replace 3D convolutions for cost volume regularization.
arXiv Detail & Related papers (2022-05-28T03:32:09Z) - MVSTER: Epipolar Transformer for Efficient Multi-View Stereo [26.640495084316925]
Learning-based Multi-View Stereo methods warp source images into 3D volumes, which are fused as a cost volume to be regularized by subsequent networks.
Previous methods utilize extra networks to learn 2D information as fusing cues, underusing 3D spatial correlations and bringing additional computation costs.
We present MVSTER, which leverages the proposed epipolar Transformer to learn both 2D semantics and 3D spatial associations efficiently.
arXiv Detail & Related papers (2022-04-15T06:47:57Z) - RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View
Stereo [35.22032072756035]
RayMVSNet learns sequential prediction of a 1D implicit field along each camera ray with the zero-crossing point indicating scene depth.
Our method ranks top on both the DTU and the Tanks & Temples datasets over all previous learning-based methods.
arXiv Detail & Related papers (2022-04-04T08:43:38Z) - Curvature-guided dynamic scale networks for Multi-view Stereo [10.667165962654996]
This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation.
We present a dynamic scale feature extraction network, namely, CDSFNet.
It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface.
arXiv Detail & Related papers (2021-12-11T14:41:05Z) - IterMVS: Iterative Probability Estimation for Efficient Multi-View
Stereo [71.84742490020611]
IterMVS is a new data-driven method for high-resolution multi-view stereo.
We propose a novel GRU-based estimator that encodes pixel-wise probability distributions of depth in its hidden state.
We verify the efficiency and effectiveness of our method on DTU, Tanks&Temples and ETH3D.
arXiv Detail & Related papers (2021-12-09T18:58:02Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency
Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map.
Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.