V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints
- URL: http://arxiv.org/abs/2308.08715v1
- Date: Thu, 17 Aug 2023 00:39:56 GMT
- Title: V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints
- Authors: Nathaniel Burgdorfer, Philippos Mordohai
- Abstract summary: We introduce a learning-based depth map fusion framework that accepts a set of depth and confidence maps generated by a Multi-View Stereo (MVS) algorithm as input and improves them.
We also introduce a depth search window estimation sub-network trained jointly with the larger fusion sub-network to reduce the depth hypothesis search space along each ray.
Our method learns to model depth consensus and violations of visibility constraints directly from the data.
- Score: 6.7197802356130465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a learning-based depth map fusion framework that accepts a set
of depth and confidence maps generated by a Multi-View Stereo (MVS) algorithm
as input and improves them. This is accomplished by integrating volumetric
visibility constraints that encode long-range surface relationships across
different views into an end-to-end trainable architecture. We also introduce a
depth search window estimation sub-network trained jointly with the larger
fusion sub-network to reduce the depth hypothesis search space along each ray.
Our method learns to model depth consensus and violations of visibility
constraints directly from the data; effectively removing the necessity of
fine-tuning fusion parameters. Extensive experiments on MVS datasets show
substantial improvements in the accuracy of the output fused depth and
confidence maps.
Related papers
- DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
We present DepthSplat to connect Gaussian splatting and depth estimation.
We first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features.
We also show that Gaussian splatting can serve as an unsupervised pre-training objective.
arXiv Detail & Related papers (2024-10-17T17:59:58Z) - Deep Cost Ray Fusion for Sparse Depth Video Completion [29.512567810302368]
We present a learning-based framework for sparse depth video completion.
We introduce RayFusion, that effectively leverages the attention mechanism for each pair of overlapped rays in adjacent cost volumes.
Our framework consistently outperforms or rivals state-of-the-art approaches on diverse indoor and outdoor datasets.
arXiv Detail & Related papers (2024-09-23T11:42:16Z) - Progressive Depth Decoupling and Modulating for Flexible Depth Completion [28.693100885012008]
Image-guided depth completion aims at generating a dense depth map from sparse LiDAR data and RGB image.
Recent methods have shown promising performance by reformulating it as a classification problem with two sub-tasks: depth discretization and probability prediction.
We propose a progressive depth decoupling and modulating network, which incrementally decouples the depth range into bins and adaptively generates multi-scale dense depth maps.
arXiv Detail & Related papers (2024-05-15T13:45:33Z) - Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z) - Self-Supervised Depth Completion Guided by 3D Perception and Geometry
Consistency [17.68427514090938]
This paper explores the utilization of 3D perceptual features and multi-view geometry consistency to devise a high-precision self-supervised depth completion method.
Experiments on benchmark datasets of NYU-Depthv2 and VOID demonstrate that the proposed model achieves the state-of-the-art depth completion performance.
arXiv Detail & Related papers (2023-12-23T14:19:56Z) - Multi-resolution Monocular Depth Map Fusion by Self-supervised
Gradient-based Composition [14.246972408737987]
We propose a novel depth map fusion module to combine the advantages of estimations with multi-resolution inputs.
Our lightweight depth fusion is one-shot and runs in real-time, making our method 80X faster than a state-of-the-art depth fusion method.
arXiv Detail & Related papers (2022-12-03T05:13:50Z) - Improving Monocular Visual Odometry Using Learned Depth [84.05081552443693]
We propose a framework to exploit monocular depth estimation for improving visual odometry (VO)
The core of our framework is a monocular depth estimation module with a strong generalization capability for diverse scenes.
Compared with current learning-based VO methods, our method demonstrates a stronger generalization ability to diverse scenes.
arXiv Detail & Related papers (2022-04-04T06:26:46Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - A Confidence-based Iterative Solver of Depths and Surface Normals for
Deep Multi-view Stereo [41.527018997251744]
We introduce a deep multi-view stereo (MVS) system that jointly predicts depths, surface normals and per-view confidence maps.
The key to our approach is a novel solver that iteratively solves for per-view depth map and normal map.
Our proposed solver consistently improves the depth quality over both conventional and deep learning based MVS pipelines.
arXiv Detail & Related papers (2022-01-19T14:08:45Z) - Light Field Reconstruction via Deep Adaptive Fusion of Hybrid Lenses [67.01164492518481]
This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses.
We propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input.
Our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission.
arXiv Detail & Related papers (2021-02-14T06:44:47Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.