Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios
- URL: http://arxiv.org/abs/2402.11826v1
- Date: Mon, 19 Feb 2024 04:39:16 GMT
- Title: Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios
- Authors: Jialei Xu, Xianming Liu, Junjun Jiang, Kui Jiang, Rui Li, Kai Cheng,
Xiangyang Ji
- Abstract summary: This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
- Score: 103.72094710263656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation from RGB images plays a pivotal role in 3D vision.
However, its accuracy can deteriorate in challenging environments such as
nighttime or adverse weather conditions. While long-wave infrared cameras offer
stable imaging in such challenging conditions, they are inherently
low-resolution, lacking rich texture and semantics as delivered by the RGB
image. Current methods focus solely on a single modality due to the
difficulties to identify and integrate faithful depth cues from both sources.
To address these issues, this paper presents a novel approach that identifies
and integrates dominant cross-modality depth features with a learning-based
framework. Concretely, we independently compute the coarse depth maps with
separate networks by fully utilizing the individual depth cues from each
modality. As the advantageous depth spreads across both modalities, we propose
a novel confidence loss steering a confidence predictor network to yield a
confidence map specifying latent potential depth areas. With the resulting
confidence map, we propose a multi-modal fusion network that fuses the final
depth in an end-to-end manner. Harnessing the proposed pipeline, our method
demonstrates the ability of robust depth estimation in a variety of difficult
scenarios. Experimental results on the challenging MS$^2$ and ViViD++ datasets
demonstrate the effectiveness and robustness of our method.
Related papers
- Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions [30.148969711689773]
We present a novel approach designed to address the complexities posed by challenging, out-of-distribution data in the single-image depth estimation task.
We systematically generate new, user-defined scenes with a comprehensive set of challenges and associated depth information.
This is achieved by leveraging cutting-edge text-to-image diffusion models with depth-aware control.
arXiv Detail & Related papers (2024-07-23T17:59:59Z) - Transparent Object Depth Completion [11.825680661429825]
The perception of transparent objects for grasp and manipulation remains a major challenge.
Existing robotic grasp methods which heavily rely on depth maps are not suitable for transparent objects due to their unique visual properties.
We propose an end-to-end network for transparent object depth completion that combines the strengths of single-view RGB-D based depth completion and multi-view depth estimation.
arXiv Detail & Related papers (2024-05-24T07:38:06Z) - Adaptive Fusion of Single-View and Multi-View Depth for Autonomous
Driving [22.58849429006898]
Current multi-view depth estimation methods or single-view and multi-view fusion methods will fail when given noisy pose settings.
We propose a single-view and multi-view fused depth estimation system, which adaptively integrates high-confident multi-view and single-view results.
Our method outperforms state-of-the-art multi-view and fusion methods under robustness testing.
arXiv Detail & Related papers (2024-03-12T11:18:35Z) - Fully Self-Supervised Depth Estimation from Defocus Clue [79.63579768496159]
We propose a self-supervised framework that estimates depth purely from a sparse focal stack.
We show that our framework circumvents the needs for the depth and AIF image ground-truth, and receives superior predictions.
arXiv Detail & Related papers (2023-03-19T19:59:48Z) - On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation [56.97699793236174]
We study two kinds of robust cross-view consistency in this paper.
We exploit the temporal coherence in both depth feature space and 3D voxel space for self-supervised monocular depth estimation.
Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2022-09-19T03:46:13Z) - Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular
3D Object Detection [37.37316176663782]
We propose a depth solving system that fully explores the visual clues from the subtasks in monocular 3D images.
Our method surpasses the current best method by more than 20% relatively on the Moderate level of test split in the KITTI 3D object detection benchmark.
arXiv Detail & Related papers (2022-05-19T08:12:55Z) - End-to-end Learning for Joint Depth and Image Reconstruction from
Diffracted Rotation [10.896567381206715]
We propose a novel end-to-end learning approach for depth from diffracted rotation.
Our approach requires a significantly less complex model and less training data, yet it is superior to existing methods in the task of monocular depth estimation.
arXiv Detail & Related papers (2022-04-14T16:14:37Z) - Robust Depth Completion with Uncertainty-Driven Loss Functions [60.9237639890582]
We introduce uncertainty-driven loss functions to improve the robustness of depth completion and handle the uncertainty in depth completion.
Our method has been tested on KITTI Depth Completion Benchmark and achieved the state-of-the-art robustness performance in terms of MAE, IMAE, and IRMSE metrics.
arXiv Detail & Related papers (2021-12-15T05:22:34Z) - Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched
Data [73.9872931307401]
We propose a novel weakly-supervised framework to train a monocular depth estimation network.
The proposed framework is composed of a sharing weight monocular depth estimation network and a depth reconstruction network for distillation.
Experimental results demonstrate that our method achieves superior performance than unsupervised and semi-supervised learning based schemes.
arXiv Detail & Related papers (2021-09-23T18:04:12Z) - Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods.
The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps.
Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.