IA-MVS: Instance-Focused Adaptive Depth Sampling for Multi-View Stereo
- URL: http://arxiv.org/abs/2505.12714v1
- Date: Mon, 19 May 2025 05:11:39 GMT
- Title: IA-MVS: Instance-Focused Adaptive Depth Sampling for Multi-View Stereo
- Authors: Yinzhe Wang, Yiwen Xiao, Hu Wang, Yiping Xu, Yan Tian,
- Abstract summary: Multi-view stereo (MVS) models based on progressive depth hypothesis narrowing have made remarkable advancements.<n>Existing methods haven't fully utilized the potential that the depth coverage of individual instances is smaller than that of the entire scene.<n>In this paper, we propose Instance-Adaptive MVS (IA-MVS)<n>It enhances the precision of depth estimation by narrowing the depth hypothesis range and conducting refinement on each instance.
- Score: 4.804216403519042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-view stereo (MVS) models based on progressive depth hypothesis narrowing have made remarkable advancements. However, existing methods haven't fully utilized the potential that the depth coverage of individual instances is smaller than that of the entire scene, which restricts further improvements in depth estimation precision. Moreover, inevitable deviations in the initial stage accumulate as the process advances. In this paper, we propose Instance-Adaptive MVS (IA-MVS). It enhances the precision of depth estimation by narrowing the depth hypothesis range and conducting refinement on each instance. Additionally, a filtering mechanism based on intra-instance depth continuity priors is incorporated to boost robustness. Furthermore, recognizing that existing confidence estimation can degrade IA-MVS performance on point clouds. We have developed a detailed mathematical model for confidence estimation based on conditional probability. The proposed method can be widely applied in models based on MVSNet without imposing extra training burdens. Our method achieves state-of-the-art performance on the DTU benchmark. The source code is available at https://github.com/KevinWang73106/IA-MVS.
Related papers
- Multi-view Reconstruction via SfM-guided Monocular Depth Estimation [92.89227629434316]
We present a new method for multi-view geometric reconstruction.<n>We incorporate SfM information, a strong multi-view prior, into the depth estimation process.<n>Our method significantly improves the quality of depth estimation compared to previous monocular depth estimation works.
arXiv Detail & Related papers (2025-03-18T17:54:06Z) - Revisiting Gradient-based Uncertainty for Monocular Depth Estimation [10.502852645001882]
We introduce gradient-based uncertainty estimation for monocular depth estimation models.<n>We demonstrate that our approach is effective in determining the uncertainty without re-training.<n>In particular, for models trained with monocular sequences and therefore most prone to uncertainty, our method outperforms related approaches.
arXiv Detail & Related papers (2025-02-09T17:21:41Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - Adaptive Fusion of Single-View and Multi-View Depth for Autonomous
Driving [22.58849429006898]
Current multi-view depth estimation methods or single-view and multi-view fusion methods will fail when given noisy pose settings.
We propose a single-view and multi-view fused depth estimation system, which adaptively integrates high-confident multi-view and single-view results.
Our method outperforms state-of-the-art multi-view and fusion methods under robustness testing.
arXiv Detail & Related papers (2024-03-12T11:18:35Z) - One at a Time: Progressive Multi-step Volumetric Probability Learning
for Reliable 3D Scene Perception [59.37727312705997]
This paper proposes to decompose the complicated 3D volume representation learning into a sequence of generative steps.
Considering the recent advances achieved by strong generative diffusion models, we introduce a multi-step learning framework, dubbed as VPD.
For the SSC task, our work stands out as the first to surpass LiDAR-based methods on the Semantic KITTI dataset.
arXiv Detail & Related papers (2023-06-22T05:55:53Z) - A technique to jointly estimate depth and depth uncertainty for unmanned
aerial vehicles [11.725077632618879]
M4Depth is a state-of-the-art depth estimation method designed for unmanned aerial vehicle (UAV) applications.
We show how M4Depth can be enhanced to perform joint depth and uncertainty estimation.
arXiv Detail & Related papers (2023-05-31T12:13:45Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Monocular Visual-Inertial Depth Estimation [66.71452943981558]
We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry.
Our approach performs global scale and shift alignment against sparse metric depth, followed by learning-based dense alignment.
We evaluate on the TartanAir and VOID datasets, observing up to 30% reduction in RMSE with dense scale alignment.
arXiv Detail & Related papers (2023-03-21T18:47:34Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - DS-MVSNet: Unsupervised Multi-view Stereo via Depth Synthesis [11.346448410152844]
In this paper, we propose the DS-MVSNet, an end-to-end unsupervised MVS structure with the source depths synthesis.
To mine the information in probability volume, we creatively synthesize the source depths by splattering the probability volume and depth hypotheses to source views.
On the other hand, we utilize the source depths to render the reference images and propose depth consistency loss and depth smoothness loss.
arXiv Detail & Related papers (2022-08-13T15:25:51Z) - Improving Monocular Visual Odometry Using Learned Depth [84.05081552443693]
We propose a framework to exploit monocular depth estimation for improving visual odometry (VO)
The core of our framework is a monocular depth estimation module with a strong generalization capability for diverse scenes.
Compared with current learning-based VO methods, our method demonstrates a stronger generalization ability to diverse scenes.
arXiv Detail & Related papers (2022-04-04T06:26:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.