Related papers: MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network

MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network

URL: http://arxiv.org/abs/2507.11333v1
Date: Tue, 15 Jul 2025 14:05:22 GMT
Title: MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network
Authors: Jianfei Jiang, Qiankun Liu, Haochen Yu, Hongyuan Liu, Liyong Wang, Jiansheng Chen, Huimin Ma,
Abstract summary: We propose MonoMVSNet, a novel monocular feature and depth guided MVS network.<n>MonoMVSNet integrates powerful priors from a monocular foundation model into multi-view geometry.<n>Experiments demonstrate that MonoMVSNet achieves state-of-the-art performance on the DTU and Tanks-and-Temples datasets.
Score: 15.138039805633353
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning-based Multi-View Stereo (MVS) methods aim to predict depth maps for a sequence of calibrated images to recover dense point clouds. However, existing MVS methods often struggle with challenging regions, such as textureless regions and reflective surfaces, where feature matching fails. In contrast, monocular depth estimation inherently does not require feature matching, allowing it to achieve robust relative depth estimation in these regions. To bridge this gap, we propose MonoMVSNet, a novel monocular feature and depth guided MVS network that integrates powerful priors from a monocular foundation model into multi-view geometry. Firstly, the monocular feature of the reference view is integrated into source view features by the attention mechanism with a newly designed cross-view position encoding. Then, the monocular depth of the reference view is aligned to dynamically update the depth candidates for edge regions during the sampling procedure. Finally, a relative consistency loss is further designed based on the monocular depth to supervise the depth prediction. Extensive experiments demonstrate that MonoMVSNet achieves state-of-the-art performance on the DTU and Tanks-and-Temples datasets, ranking first on the Tanks-and-Temples Intermediate and Advanced benchmarks. The source code is available at https://github.com/JianfeiJ/MonoMVSNet.

Related papers

MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction [45.70946415376022]
Monocular depth priors have been widely adopted by neural rendering in multi-view based tasks such as 3D reconstruction and novel view synthesis.<n>Current methods treat the entire estimated depth map indiscriminately, and use it as ground truth supervision.<n>We propose MonoInstance, a general approach that explores the uncertainty of monocular depths to provide enhanced geometric priors.
arXiv Detail & Related papers (2025-03-24T05:58:06Z)
Multi-view Reconstruction via SfM-guided Monocular Depth Estimation [92.89227629434316]
We present a new method for multi-view geometric reconstruction.<n>We incorporate SfM information, a strong multi-view prior, into the depth estimation process.<n>Our method significantly improves the quality of depth estimation compared to previous monocular depth estimation works.
arXiv Detail & Related papers (2025-03-18T17:54:06Z)
Monocular Visual-Inertial Depth Estimation [66.71452943981558]
We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry. Our approach performs global scale and shift alignment against sparse metric depth, followed by learning-based dense alignment. We evaluate on the TartanAir and VOID datasets, observing up to 30% reduction in RMSE with dense scale alignment.
arXiv Detail & Related papers (2023-03-21T18:47:34Z)
Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning [22.828829870704006]
Self-supervised monocular methods can efficiently learn depth information of weakly textured surfaces or reflective objects. In contrast, multi-frame depth estimation methods improve the depth accuracy thanks to the success of Multi-View Stereo. We propose MOVEDepth, which exploits the MOnocular cues and VE guidance to improve multi-frame depth learning.
arXiv Detail & Related papers (2022-08-19T06:32:06Z)
Improving Monocular Visual Odometry Using Learned Depth [84.05081552443693]
We propose a framework to exploit monocular depth estimation for improving visual odometry (VO) The core of our framework is a monocular depth estimation module with a strong generalization capability for diverse scenes. Compared with current learning-based VO methods, our method demonstrates a stronger generalization ability to diverse scenes.
arXiv Detail & Related papers (2022-04-04T06:26:46Z)
A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo [41.527018997251744]
We introduce a deep multi-view stereo (MVS) system that jointly predicts depths, surface normals and per-view confidence maps. The key to our approach is a novel solver that iteratively solves for per-view depth map and normal map. Our proposed solver consistently improves the depth quality over both conventional and deep learning based MVS pipelines.
arXiv Detail & Related papers (2022-01-19T14:08:45Z)
TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework. For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments. TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z)
Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction [72.30870535815258]
CNNs for monocular depth prediction represent two largely disjoint approaches towards building a 3D map of the surrounding environment. We propose a joint narrow and wide baseline based self-improving framework, where on the one hand the CNN-predicted depth is leveraged to perform pseudo RGB-D feature-based SLAM. On the other hand, the bundle-adjusted 3D scene structures and camera poses from the more principled geometric SLAM are injected back into the depth network through novel wide baseline losses.
arXiv Detail & Related papers (2020-04-22T16:31:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.