RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo
- URL: http://arxiv.org/abs/2205.14320v3
- Date: Wed, 22 Mar 2023 00:55:32 GMT
- Title: RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo
- Authors: Changjiang Cai, Pan Ji, Qingan Yan, Yi Xu
- Abstract summary: "Learning-to-optimize" paradigm iteratively indexes a plane-sweeping cost volume and regresses the depth map via a convolutional Gated Recurrent Unit (GRU)
We conduct extensive experiments on real-world MVS datasets and show that our method achieves state-of-the-art performance.
- Score: 22.32720993997916
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper presents a learning-based method for multi-view depth estimation
from posed images. Our core idea is a "learning-to-optimize" paradigm that
iteratively indexes a plane-sweeping cost volume and regresses the depth map
via a convolutional Gated Recurrent Unit (GRU). Since the cost volume plays a
paramount role in encoding the multi-view geometry, we aim to improve its
construction both at pixel- and frame- levels. At the pixel level, we propose
to break the symmetry of the Siamese network (which is typically used in MVS to
extract image features) by introducing a transformer block to the reference
image (but not to the source images). Such an asymmetric volume allows the
network to extract global features from the reference image to predict its
depth map. Given potential inaccuracies in the poses between reference and
source images, we propose to incorporate a residual pose network to correct the
relative poses. This essentially rectifies the cost volume at the frame level.
We conduct extensive experiments on real-world MVS datasets and show that our
method achieves state-of-the-art performance in terms of both within-dataset
evaluation and cross-dataset generalization.
Related papers
- Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - BEV-Locator: An End-to-end Visual Semantic Localization Network Using
Multi-View Images [13.258689143949912]
We propose an end-to-end visual semantic localization neural network using multi-view camera images.
The BEV-Locator is capable to estimate the vehicle poses under versatile scenarios.
Experiments report satisfactory accuracy with mean absolute errors of 0.052m, 0.135m and 0.251$circ$ in lateral, longitudinal translation and heading angle degree.
arXiv Detail & Related papers (2022-11-27T20:24:56Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - Multi-Frame Self-Supervised Depth with Transformers [33.00363651105475]
We propose a novel transformer architecture for cost volume generation.
We use depth-discretized epipolar sampling to select matching candidates.
We refine predictions through a series of self- and cross-attention layers.
arXiv Detail & Related papers (2022-04-15T19:04:57Z) - Curvature-guided dynamic scale networks for Multi-view Stereo [10.667165962654996]
This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation.
We present a dynamic scale feature extraction network, namely, CDSFNet.
It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface.
arXiv Detail & Related papers (2021-12-11T14:41:05Z) - Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS)
We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry.
Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.