Boosting Multi-View Stereo with Depth Foundation Model in the Absence of Real-World Labels
- URL: http://arxiv.org/abs/2504.11845v1
- Date: Wed, 16 Apr 2025 08:07:09 GMT
- Title: Boosting Multi-View Stereo with Depth Foundation Model in the Absence of Real-World Labels
- Authors: Jie Zhu, Bo Peng, Zhe Zhang, Bingzheng Liu, Jianjun Lei,
- Abstract summary: A novel method termed DFM-MVS is proposed to leverage the depth foundation model to generate the effective depth prior.<n>Specifically, a depth prior-based pseudo-supervised training mechanism is developed to simulate realistic stereo correspondences.<n> Experimental results on DTU and Tanks & Temples datasets demonstrate that the proposed DFM-MVS significantly outperforms existing MVS methods without using real-world labels.
- Score: 23.36740525849356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-based Multi-View Stereo (MVS) methods have made remarkable progress in recent years. However, how to effectively train the network without using real-world labels remains a challenging problem. In this paper, driven by the recent advancements of vision foundation models, a novel method termed DFM-MVS, is proposed to leverage the depth foundation model to generate the effective depth prior, so as to boost MVS in the absence of real-world labels. Specifically, a depth prior-based pseudo-supervised training mechanism is developed to simulate realistic stereo correspondences using the generated depth prior, thereby constructing effective supervision for the MVS network. Besides, a depth prior-guided error correction strategy is presented to leverage the depth prior as guidance to mitigate the error propagation problem inherent in the widely-used coarse-to-fine network structure. Experimental results on DTU and Tanks & Temples datasets demonstrate that the proposed DFM-MVS significantly outperforms existing MVS methods without using real-world labels.
Related papers
- VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation [8.66253032039513]
VistaDepth is a novel framework that integrates adaptive frequency-domain feature enhancements with an adaptive weight-balancing mechanism.
VistaDepth achieves state-of-the-art performance among diffusion-based MDE techniques, particularly excelling in the accurate reconstruction of distant regions.
arXiv Detail & Related papers (2025-04-21T13:30:51Z) - Multi-view Reconstruction via SfM-guided Monocular Depth Estimation [92.89227629434316]
We present a new method for multi-view geometric reconstruction.
We incorporate SfM information, a strong multi-view prior, into the depth estimation process.
Our method significantly improves the quality of depth estimation compared to previous monocular depth estimation works.
arXiv Detail & Related papers (2025-03-18T17:54:06Z) - MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View
Stereo [60.75684891484619]
We introduce MVSFormer++, a method that maximizes the inherent characteristics of attention to enhance various components of the MVS pipeline.
We employ different attention mechanisms for the feature encoder and cost volume regularization, focusing on feature and spatial aggregations respectively.
Comprehensive experiments on DTU, Tanks-and-Temples, BlendedMVS, and ETH3D validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-01-22T03:22:49Z) - Re-Evaluating LiDAR Scene Flow for Autonomous Driving [80.37947791534985]
Popular benchmarks for self-supervised LiDAR scene flow have unrealistic rates of dynamic motion, unrealistic correspondences, and unrealistic sampling patterns.
We evaluate a suite of top methods on a suite of real-world datasets.
We show that despite the emphasis placed on learning, most performance gains are caused by pre- and post-processing steps.
arXiv Detail & Related papers (2023-04-04T22:45:50Z) - Sparse Depth-Guided Attention for Accurate Depth Completion: A
Stereo-Assisted Monitored Distillation Approach [7.902840502973506]
We introduce a stereo-based model as a teacher model to improve the accuracy of the student model for depth completion.
To provide self-supervised information, we also employ multi-view depth consistency and multi-scale minimum reprojection.
arXiv Detail & Related papers (2023-03-28T09:23:19Z) - DS-MVSNet: Unsupervised Multi-view Stereo via Depth Synthesis [11.346448410152844]
In this paper, we propose the DS-MVSNet, an end-to-end unsupervised MVS structure with the source depths synthesis.
To mine the information in probability volume, we creatively synthesize the source depths by splattering the probability volume and depth hypotheses to source views.
On the other hand, we utilize the source depths to render the reference images and propose depth consistency loss and depth smoothness loss.
arXiv Detail & Related papers (2022-08-13T15:25:51Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - Pyramid Frequency Network with Spatial Attention Residual Refinement
Module for Monocular Depth Estimation [4.397981844057195]
Deep-learning approaches to depth estimation are rapidly advancing, offering superior performance over existing methods.
In this work, a Pyramid Frequency Network with Spatial Attention Residual Refinement Module is proposed to deal with the weak robustness of existing deep-learning methods.
PFN achieves better visual accuracy than state-of-the-art methods in both indoor and outdoor scenes on Make3D, KITTI depth, and NYUv2 datasets.
arXiv Detail & Related papers (2022-04-05T17:48:26Z) - DDL-MVS: Depth Discontinuity Learning for MVS Networks [0.5735035463793007]
We propose depth discontinuity learning for MVS methods, which further improves accuracy while retaining the completeness of the reconstruction.
We validate our idea and demonstrate that our strategies can be easily integrated into the existing learning-based MVS pipeline.
arXiv Detail & Related papers (2022-03-02T20:25:31Z) - Low-light Image Enhancement by Retinex Based Algorithm Unrolling and
Adjustment [50.13230641857892]
We propose a new deep learning framework for the low-light image enhancement (LIE) problem.
The proposed framework contains a decomposition network inspired by algorithm unrolling, and adjustment networks considering both global brightness and local brightness sensitivity.
Experiments on a series of typical LIE datasets demonstrated the effectiveness of the proposed method, both quantitatively and visually, as compared with existing methods.
arXiv Detail & Related papers (2022-02-12T03:59:38Z) - Digging into Uncertainty in Self-supervised Multi-view Stereo [57.04768354383339]
We propose a novel Uncertainty reduction Multi-view Stereo (UMVS) framework for self-supervised learning.
Our framework achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents.
arXiv Detail & Related papers (2021-08-30T02:53:08Z) - Continual Adaptation for Deep Stereo [52.181067640300014]
We propose a continual adaptation paradigm for deep stereo networks designed to deal with challenging and ever-changing environments.
In our paradigm, the learning signals needed to continuously adapt models online can be sourced from self-supervision via right-to-left image warping or from traditional stereo algorithms.
Our network architecture and adaptation algorithms realize the first real-time self-adaptive deep stereo system.
arXiv Detail & Related papers (2020-07-10T08:15:58Z) - Channel Attention based Iterative Residual Learning for Depth Map
Super-Resolution [58.626803922196146]
We argue that DSR models trained on synthetic dataset are restrictive and not effective in dealing with real-world DSR tasks.
We make two contributions in tackling real-world degradation of different depth sensors.
We propose a new framework for real-world DSR, which consists of four modules.
arXiv Detail & Related papers (2020-06-02T09:12:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.