Shedding Light on Depth: Explainability Assessment in Monocular Depth Estimation
- URL: http://arxiv.org/abs/2509.15980v1
- Date: Fri, 19 Sep 2025 13:45:18 GMT
- Title: Shedding Light on Depth: Explainability Assessment in Monocular Depth Estimation
- Authors: Lorenzo Cirillo, Claudio Schiavella, Lorenzo Papa, Paolo Russo, Irene Amerini,
- Abstract summary: We study how to analyze MDE networks to map the input image to the predicted depth map.<n>We assess the quality of the generated visual explanations by selectively perturbing the most relevant and irrelevant pixels.<n>This metric evaluates the reliability of the feature attribution by assessing their consistency with the predicted depth map.
- Score: 12.223576286931094
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Explainable artificial intelligence is increasingly employed to understand the decision-making process of deep learning models and create trustworthiness in their adoption. However, the explainability of Monocular Depth Estimation (MDE) remains largely unexplored despite its wide deployment in real-world applications. In this work, we study how to analyze MDE networks to map the input image to the predicted depth map. More in detail, we investigate well-established feature attribution methods, Saliency Maps, Integrated Gradients, and Attention Rollout on different computationally complex models for MDE: METER, a lightweight network, and PixelFormer, a deep network. We assess the quality of the generated visual explanations by selectively perturbing the most relevant and irrelevant pixels, as identified by the explainability methods, and analyzing the impact of these perturbations on the model's output. Moreover, since existing evaluation metrics can have some limitations in measuring the validity of visual explanations for MDE, we additionally introduce the Attribution Fidelity. This metric evaluates the reliability of the feature attribution by assessing their consistency with the predicted depth map. Experimental results demonstrate that Saliency Maps and Integrated Gradients have good performance in highlighting the most important input features for MDE lightweight and deep models, respectively. Furthermore, we show that Attribution Fidelity effectively identifies whether an explainability method fails to produce reliable visual maps, even in scenarios where conventional metrics might suggest satisfactory results.
Related papers
- StarryGazer: Leveraging Monocular Depth Estimation Models for Domain-Agnostic Single Depth Image Completion [56.28564075246147]
StarryGazer is a framework that predicts dense depth images from a single sparse depth image and an RGB image.<n>We employ a pre-trained MDE model to produce relative depth images.<n>A refinement network is trained with the synthetic pairs, incorporating the relative depth maps and RGB images to improve the model's accuracy and robustness.
arXiv Detail & Related papers (2025-12-15T09:56:09Z) - Relative Pose Estimation through Affine Corrections of Monocular Depth Priors [69.59216331861437]
We develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities.<n>We propose a hybrid estimation pipeline that combines our proposed solvers with classic point-based solvers and epipolar constraints.
arXiv Detail & Related papers (2025-01-09T18:58:30Z) - Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation [50.920911532133154]
The intrinsic ill-posedness and ordinal-sensitive nature of monocular depth estimation (MDE) models pose major challenges to the estimation of uncertainty degree.
We propose to model the uncertainty of MDE models from the perspective of the inherent probability distributions.
By simply introducing additional training regularization terms, our model, with surprisingly simple formations and without requiring extra modules or multiple inferences, can provide uncertainty estimations with state-of-the-art reliability.
arXiv Detail & Related papers (2023-07-19T12:11:15Z) - Self-Supervised Monocular Depth Estimation with Internal Feature Fusion [12.874712571149725]
Self-supervised learning for depth estimation uses geometry in image sequences for supervision.
We propose a novel depth estimation networkDIFFNet, which can make use of semantic information in down and upsampling procedures.
arXiv Detail & Related papers (2021-10-18T17:31:11Z) - Who Explains the Explanation? Quantitatively Assessing Feature
Attribution Methods [0.0]
We propose a novel evaluation metric -- the Focus -- designed to quantify the faithfulness of explanations.
We show the robustness of the metric through randomization experiments, and then use Focus to evaluate and compare three popular explainability techniques.
Our results find LRP and GradCAM to be consistent and reliable, while the latter remains most competitive even when applied to poorly performing models.
arXiv Detail & Related papers (2021-09-28T07:10:24Z) - Towards Interpretable Deep Networks for Monocular Depth Estimation [78.84690613778739]
We quantify the interpretability of a deep MDE network by the depth selectivity of its hidden units.
We propose a method to train interpretable MDE deep networks without changing their original architectures.
Experimental results demonstrate that our method is able to enhance the interpretability of deep MDE networks.
arXiv Detail & Related papers (2021-08-11T16:43:45Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware
Feature Extraction [27.750031877854717]
We propose SAFENet that is designed to leverage semantic information to overcome the limitations of the photometric loss.
Our key idea is to exploit semantic-aware depth features that integrate the semantic and geometric knowledge.
Experiments on KITTI dataset demonstrate that our methods compete or even outperform the state-of-the-art methods.
arXiv Detail & Related papers (2020-10-06T17:22:25Z) - Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods.
The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps.
Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z) - Assessing the Reliability of Visual Explanations of Deep Models with
Adversarial Perturbations [15.067369314723958]
We propose an objective measure to evaluate the reliability of explanations of deep models.
Our approach is based on changes in the network's outcome resulting from the perturbation of input images in an adversarial way.
We also propose a straightforward application of our approach to clean relevance maps, creating more interpretable maps without any loss in essential explanation.
arXiv Detail & Related papers (2020-04-22T19:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.