Accuracy Does Not Guarantee Human-Likeness in Monocular Depth Estimators
- URL: http://arxiv.org/abs/2512.08163v1
- Date: Tue, 09 Dec 2025 01:42:00 GMT
- Title: Accuracy Does Not Guarantee Human-Likeness in Monocular Depth Estimators
- Authors: Yuki Kubota, Taiki Fukiage,
- Abstract summary: Deep neural networks (DNNs) have achieved superhuman accuracy on physical-based benchmarks.<n>Monocular depth estimation is a fundamental capability for real-world applications such as autonomous driving and robotics.<n>Research in object recognition has revealed a complex trade-off between model accuracy and human-like behavior.
- Score: 2.466518228012258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation is a fundamental capability for real-world applications such as autonomous driving and robotics. Although deep neural networks (DNNs) have achieved superhuman accuracy on physical-based benchmarks, a key challenge remains: aligning model representations with human perception, a promising strategy for enhancing model robustness and interpretability. Research in object recognition has revealed a complex trade-off between model accuracy and human-like behavior, raising a question whether a similar divergence exist in depth estimation, particularly for natural outdoor scenes where benchmarks rely on sensor-based ground truth rather than human perceptual estimates. In this study, we systematically investigated the relationship between model accuracy and human similarity across 69 monocular depth estimators using the KITTI dataset. To dissect the structure of error patterns on a factor-by-factor basis, we applied affine fitting to decompose prediction errors into interpretable components. Intriguingly, our results reveal while humans and DNNs share certain estimation biases (positive error correlations), we observed distinct trade-off relationships between model accuracy and human similarity. This finding indicates that improving accuracy does not necessarily lead to more human-like behavior, underscoring the necessity of developing multifaceted, human-centric evaluations beyond traditional accuracy.
Related papers
- The Impact of Concept Explanations and Interventions on Human-Machine Collaboration [0.03999851878220877]
Concept Bottleneck Models (CBMs) were introduced to predict human-defined concepts as an intermediate step before predicting task labels.<n>CBMs improve interpretability compared to standard Deep Neural Networks (DNNs)<n>However, this increased alignment did not translate to a significant increase in task accuracy.
arXiv Detail & Related papers (2025-10-19T16:44:24Z) - Uncertainty Estimation by Human Perception versus Neural Models [7.273936646577467]
Modern neural networks (NNs) often achieve high predictive accuracy but are poorly calibrated.<n>We investigate how human uncertainty compares to uncertainty estimated by NNs.
arXiv Detail & Related papers (2025-06-18T20:00:20Z) - CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding [62.075029712357]
This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM)
CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and weight the guidance with precision weights estimated by the inherent property of diffusion models.
We apply CogDPM to real-world prediction tasks using the United Kindom precipitation and surface wind datasets.
arXiv Detail & Related papers (2024-05-03T15:54:50Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - Improving Prediction of Cognitive Performance using Deep Neural Networks
in Sparse Data [2.867517731896504]
We used data from an observational, cohort study, Midlife in the United States (MIDUS) to model executive function and episodic memory measures.
Deep neural network (DNN) models consistently ranked highest in all of the cognitive performance prediction tasks.
arXiv Detail & Related papers (2021-12-28T22:23:08Z) - Interpretable Additive Recurrent Neural Networks For Multivariate
Clinical Time Series [4.125698836261585]
We present the Interpretable-RNN (I-RNN) that balances model complexity and accuracy by forcing the relationship between variables in the model to be additive.
I-RNN specifically captures the unique characteristics of clinical time series, which are unevenly sampled in time, asynchronously acquired, and have missing data.
We evaluate the I-RNN model on the Physionet 2012 Challenge dataset to predict in-hospital mortality, and on a real-world clinical decision support task: predicting hemodynamic interventions in the intensive care unit.
arXiv Detail & Related papers (2021-09-15T22:30:19Z) - Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks.
This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network.
Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z) - Interpretable Social Anchors for Human Trajectory Forecasting in Crowds [84.20437268671733]
We propose a neural network-based system to predict human trajectory in crowds.
We learn interpretable rule-based intents, and then utilise the expressibility of neural networks to model scene-specific residual.
Our architecture is tested on the interaction-centric benchmark TrajNet++.
arXiv Detail & Related papers (2021-05-07T09:22:34Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans
by measuring error consistency [10.028543085687803]
A central problem in cognitive science and behavioural neuroscience is to ascertain whether two or more decision makers (be they brains or algorithms) use the same strategy.
We introduce trial-by-trial error consistency, a quantitative analysis for measuring whether two decision making systems systematically make errors on the same inputs.
arXiv Detail & Related papers (2020-06-30T12:47:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.