Related papers: GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo

GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo

URL: http://arxiv.org/abs/2310.19583v3
Date: Thu, 21 Dec 2023 15:14:22 GMT
Title: GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo
Authors: Vibhas K. Vats, Sripad Joshi, David J. Crandall, Md. Alimoor Reza, Soon-heung Jung
Abstract summary: We present a novel approach that explicitly encourages geometric consistency of reference view depth maps across multiple source views at different scales during learning. We find that adding this geometric consistency loss significantly accelerates learning by explicitly penalizing geometrically inconsistent pixels. Our experiments show that our approach achieves a new state-of-the-art on the DTU and BlendedMVS datasets, and competitive results on the Tanks and Temples benchmark.
Score: 10.732653898606253
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Traditional multi-view stereo (MVS) methods rely heavily on photometric and geometric consistency constraints, but newer machine learning-based MVS methods check geometric consistency across multiple source views only as a post-processing step. In this paper, we present a novel approach that explicitly encourages geometric consistency of reference view depth maps across multiple source views at different scales during learning (see Fig. 1). We find that adding this geometric consistency loss significantly accelerates learning by explicitly penalizing geometrically inconsistent pixels, reducing the training iteration requirements to nearly half that of other MVS methods. Our extensive experiments show that our approach achieves a new state-of-the-art on the DTU and BlendedMVS datasets, and competitive results on the Tanks and Temples benchmark. To the best of our knowledge, GC-MVSNet is the first attempt to enforce multi-view, multi-scale geometric consistency during learning.

Related papers

Multi-view dense image matching with similarity learning and geometry priors [0.0]
MV-DeepSimNets is a suite of deep neural networks designed for multi-view similarity learning.<n>Our approach incorporates an online geometry prior to characterize pixel relationships.<n>Our method geometric preconditioning effectively adapts epipolar-based features for enhanced multi-view reconstruction.
arXiv Detail & Related papers (2025-05-16T13:55:40Z)
Blending 3D Geometry and Machine Learning for Multi-View Stereopsis [3.259672998844162]
GC MVSNet plus plus is a novel approach to enforce multi-view, multi-scale supervised geometric consistency during learning.<n>This integrated GC check significantly accelerates the learning process by directly penalizing geometrically inconsistent pixels.<n>Our approach achieves a new state of the art on the DTU and BlendedMVS datasets and secures second place on the Tanks and Temples benchmark.
arXiv Detail & Related papers (2025-05-06T12:22:45Z)
A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information. We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z)
3D-LMVIC: Learning-based Multi-View Image Coding with 3D Gaussian Geometric Priors [42.27461884229915]
We propose 3D-LMVIC, a novel learning-based multi-view image compression framework. We show that 3D-LMVIC achieves superior performance compared to both traditional and learning-based methods. It significantly improves disparity estimation accuracy over existing two-view approaches.
arXiv Detail & Related papers (2024-09-06T03:53:59Z)
Towards Geometric-Photometric Joint Alignment for Facial Mesh Registration [3.588864037082647]
This paper presents a Geometric-Photometric Joint Alignment method, for accurately aligning human expressions by combining geometry and photometric information. Experimental results demonstrate faithful alignment under various expressions, surpassing the conventional ICP-based methods and the state-of-the-art deep learning based method. In practical, our method enhances the efficiency of obtaining topology-consistent face models from multi-view stereo facial scanning.
arXiv Detail & Related papers (2024-03-05T03:39:23Z)
MP-MVS: Multi-Scale Windows PatchMatch and Planar Prior Multi-View Stereo [7.130834755320434]
We propose a resilient and effective multi-view stereo approach (MP-MVS) We design a multi-scale windows PatchMatch (mPM) to obtain reliable depth of untextured areas. In contrast with other multi-scale approaches, which is faster and can be easily extended to PatchMatch-based MVS approaches.
arXiv Detail & Related papers (2023-09-23T07:30:42Z)
MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition. MVTN can be trained end-to-end with any multi-view network for 3D shape recognition. Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z)
RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering [16.679446000660654]
We propose a novel approach with neural rendering (RC-MVSNet) to solve ambiguity issues of correspondences among views. Specifically, we impose a depth rendering consistency loss to constrain the geometry features close to the object surface. We also introduce a reference view loss to generate consistent supervision, even for non-Lambertian surfaces.
arXiv Detail & Related papers (2022-03-08T09:24:05Z)
PatchMVSNet: Patch-wise Unsupervised Multi-View Stereo for Weakly-Textured Surface Reconstruction [2.9896482273918434]
This paper proposes robust loss functions leveraging constraints beneath multi-view images to alleviate matching ambiguity. Our strategy can be implemented with arbitrary depth estimation frameworks and can be trained with arbitrary large-scale MVS datasets. Our method reaches the performance of the state-of-the-art methods on popular benchmarks, like DTU, Tanks and Temples and ETH3D.
arXiv Detail & Related papers (2022-03-04T07:05:23Z)
Isometric Multi-Shape Matching [50.86135294068138]
Finding correspondences between shapes is a fundamental problem in computer vision and graphics. While isometries are often studied in shape correspondence problems, they have not been considered explicitly in the multi-matching setting. We present a suitable optimisation algorithm for solving our formulation and provide a convergence and complexity analysis.
arXiv Detail & Related papers (2020-12-04T15:58:34Z)
Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video. Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer. To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)
Recurrent Multi-view Alignment Network for Unsupervised Surface Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data. We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations. We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z)
Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map. Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.