GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View
Stereo
- URL: http://arxiv.org/abs/2310.19583v3
- Date: Thu, 21 Dec 2023 15:14:22 GMT
- Title: GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View
Stereo
- Authors: Vibhas K. Vats, Sripad Joshi, David J. Crandall, Md. Alimoor Reza,
Soon-heung Jung
- Abstract summary: We present a novel approach that explicitly encourages geometric consistency of reference view depth maps across multiple source views at different scales during learning.
We find that adding this geometric consistency loss significantly accelerates learning by explicitly penalizing geometrically inconsistent pixels.
Our experiments show that our approach achieves a new state-of-the-art on the DTU and BlendedMVS datasets, and competitive results on the Tanks and Temples benchmark.
- Score: 10.732653898606253
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Traditional multi-view stereo (MVS) methods rely heavily on photometric and
geometric consistency constraints, but newer machine learning-based MVS methods
check geometric consistency across multiple source views only as a
post-processing step. In this paper, we present a novel approach that
explicitly encourages geometric consistency of reference view depth maps across
multiple source views at different scales during learning (see Fig. 1). We find
that adding this geometric consistency loss significantly accelerates learning
by explicitly penalizing geometrically inconsistent pixels, reducing the
training iteration requirements to nearly half that of other MVS methods. Our
extensive experiments show that our approach achieves a new state-of-the-art on
the DTU and BlendedMVS datasets, and competitive results on the Tanks and
Temples benchmark. To the best of our knowledge, GC-MVSNet is the first attempt
to enforce multi-view, multi-scale geometric consistency during learning.
Related papers
- Towards Geometric-Photometric Joint Alignment for Facial Mesh
Registration [3.588864037082647]
This paper presents a Geometric-Photometric Joint Alignment method, for accurately aligning human expressions by combining geometry and photometric information.
Experimental results demonstrate faithful alignment under various expressions, surpassing the conventional ICP-based methods and the state-of-the-art deep learning based method.
In practical, our method enhances the efficiency of obtaining topology-consistent face models from multi-view stereo facial scanning.
arXiv Detail & Related papers (2024-03-05T03:39:23Z) - MP-MVS: Multi-Scale Windows PatchMatch and Planar Prior Multi-View
Stereo [7.130834755320434]
We propose a resilient and effective multi-view stereo approach (MP-MVS)
We design a multi-scale windows PatchMatch (mPM) to obtain reliable depth of untextured areas.
In contrast with other multi-scale approaches, which is faster and can be easily extended to PatchMatch-based MVS approaches.
arXiv Detail & Related papers (2023-09-23T07:30:42Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - Deep PatchMatch MVS with Learned Patch Coplanarity, Geometric
Consistency and Adaptive Pixel Sampling [19.412014102866507]
We build on learning-based approaches to improve photometric scores by learning patch coplanarity and encourage geometric consistency.
We propose an adaptive pixel sampling strategy for candidate propagation that reduces memory to enable training on larger resolution with more views and a larger encoder.
arXiv Detail & Related papers (2022-10-14T07:29:03Z) - RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering [16.679446000660654]
We propose a novel approach with neural rendering (RC-MVSNet) to solve ambiguity issues of correspondences among views.
Specifically, we impose a depth rendering consistency loss to constrain the geometry features close to the object surface.
We also introduce a reference view loss to generate consistent supervision, even for non-Lambertian surfaces.
arXiv Detail & Related papers (2022-03-08T09:24:05Z) - PatchMVSNet: Patch-wise Unsupervised Multi-View Stereo for
Weakly-Textured Surface Reconstruction [2.9896482273918434]
This paper proposes robust loss functions leveraging constraints beneath multi-view images to alleviate matching ambiguity.
Our strategy can be implemented with arbitrary depth estimation frameworks and can be trained with arbitrary large-scale MVS datasets.
Our method reaches the performance of the state-of-the-art methods on popular benchmarks, like DTU, Tanks and Temples and ETH3D.
arXiv Detail & Related papers (2022-03-04T07:05:23Z) - Isometric Multi-Shape Matching [50.86135294068138]
Finding correspondences between shapes is a fundamental problem in computer vision and graphics.
While isometries are often studied in shape correspondence problems, they have not been considered explicitly in the multi-matching setting.
We present a suitable optimisation algorithm for solving our formulation and provide a convergence and complexity analysis.
arXiv Detail & Related papers (2020-12-04T15:58:34Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Recurrent Multi-view Alignment Network for Unsupervised Surface
Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations.
We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z) - Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency
Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map.
Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z) - Consistent Video Depth Estimation [57.712779457632024]
We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video.
We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video.
Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion.
arXiv Detail & Related papers (2020-04-30T17:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.