Adaptive Fusion of Single-View and Multi-View Depth for Autonomous
Driving
- URL: http://arxiv.org/abs/2403.07535v1
- Date: Tue, 12 Mar 2024 11:18:35 GMT
- Title: Adaptive Fusion of Single-View and Multi-View Depth for Autonomous
Driving
- Authors: JunDa Cheng, Wei Yin, Kaixuan Wang, Xiaozhi Chen, Shijie Wang, Xin
Yang
- Abstract summary: Current multi-view depth estimation methods or single-view and multi-view fusion methods will fail when given noisy pose settings.
We propose a single-view and multi-view fused depth estimation system, which adaptively integrates high-confident multi-view and single-view results.
Our method outperforms state-of-the-art multi-view and fusion methods under robustness testing.
- Score: 22.58849429006898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-view depth estimation has achieved impressive performance over various
benchmarks. However, almost all current multi-view systems rely on given ideal
camera poses, which are unavailable in many real-world scenarios, such as
autonomous driving. In this work, we propose a new robustness benchmark to
evaluate the depth estimation system under various noisy pose settings.
Surprisingly, we find current multi-view depth estimation methods or
single-view and multi-view fusion methods will fail when given noisy pose
settings. To address this challenge, we propose a single-view and multi-view
fused depth estimation system, which adaptively integrates high-confident
multi-view and single-view results for both robust and accurate depth
estimations. The adaptive fusion module performs fusion by dynamically
selecting high-confidence regions between two branches based on a wrapping
confidence map. Thus, the system tends to choose the more reliable branch when
facing textureless scenes, inaccurate calibration, dynamic objects, and other
degradation or challenging conditions. Our method outperforms state-of-the-art
multi-view and fusion methods under robustness testing. Furthermore, we achieve
state-of-the-art performance on challenging benchmarks (KITTI and DDAD) when
given accurate pose estimations. Project website:
https://github.com/Junda24/AFNet/.
Related papers
- A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature
Fusion in Dynamic Scenes [25.712707161201802]
Multi-frame methods improve monocular depth estimation over single-frame approaches.
Recent methods tend to propose complex architectures for feature matching and dynamic scenes.
We show that a simple learning framework, together with designed feature augmentation, leads to superior performance.
arXiv Detail & Related papers (2023-03-26T05:26:30Z) - Multi-Camera Collaborative Depth Prediction via Consistent Structure
Estimation [75.99435808648784]
We propose a novel multi-camera collaborative depth prediction method.
It does not require large overlapping areas while maintaining structure consistency between cameras.
Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2022-10-05T03:44:34Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Multi-Camera Sensor Fusion for Visual Odometry using Deep Uncertainty
Estimation [34.8860186009308]
We propose a deep sensor fusion framework which estimates vehicle motion using both pose and uncertainty estimations from multiple on-board cameras.
We evaluate our approach on the publicly available, large scale autonomous vehicle dataset, nuScenes.
arXiv Detail & Related papers (2021-12-23T19:44:45Z) - Multi-View Depth Estimation by Fusing Single-View Depth Probability with
Multi-View Geometry [25.003116148843525]
We propose MaGNet, a framework for fusing single-view depth probability with multi-view geometry.
MaGNet achieves state-of-the-art performance on ScanNet, 7-Scenes and KITTI.
arXiv Detail & Related papers (2021-12-15T14:56:53Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.