Related papers: The Role of Model Confidence on Bias Effects in Measured Uncertainties for Vision-Language Models

The Role of Model Confidence on Bias Effects in Measured Uncertainties for Vision-Language Models

URL: http://arxiv.org/abs/2506.16724v2
Date: Thu, 09 Oct 2025 06:08:47 GMT
Title: The Role of Model Confidence on Bias Effects in Measured Uncertainties for Vision-Language Models
Authors: Xinyi Liu, Weiguang Wang, Hangfeng He,
Abstract summary: We find that mitigating prompt-introduced bias improves uncertainty quantification in GPT-4o.<n>We also find that all considered biases have greater effects in both uncertainties when bias-free model confidence is lower.<n>These distinct effects deepen our understanding of bias mitigation for uncertainty quantification and potentially inform the development of more advanced techniques.
Score: 10.069846144480119
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the growing adoption of Large Language Models (LLMs) for open-ended tasks, accurately assessing epistemic uncertainty, which reflects a model's lack of knowledge, has become crucial to ensuring reliable outcomes. However, quantifying epistemic uncertainty in such tasks is challenging due to the presence of aleatoric uncertainty, which arises from multiple valid answers. While bias can introduce noise into epistemic uncertainty estimation, it may also reduce noise from aleatoric uncertainty. To investigate this trade-off, we conduct experiments on Visual Question Answering (VQA) tasks and find that mitigating prompt-introduced bias improves uncertainty quantification in GPT-4o. Building on prior work showing that LLMs tend to copy input information when model confidence is low, we further analyze how these prompt biases affect measured epistemic and aleatoric uncertainty across varying bias-free confidence levels with GPT-4o and Qwen2-VL. We find that all considered biases have greater effects in both uncertainties when bias-free model confidence is lower. Moreover, lower bias-free model confidence is associated with greater bias-induced underestimation of epistemic uncertainty, resulting in overconfident estimates, whereas it has no significant effect on the direction of bias effect in aleatoric uncertainty estimation. These distinct effects deepen our understanding of bias mitigation for uncertainty quantification and potentially inform the development of more advanced techniques.

Related papers

Towards Reliable LLM-based Robot Planning via Combined Uncertainty Estimation [68.106428321492]
Large language models (LLMs) demonstrate advanced reasoning abilities, enabling robots to understand natural language instructions and generate high-level plans with appropriate grounding.<n>LLMs hallucinations present a significant challenge, often leading to overconfident yet potentially misaligned or unsafe plans.<n>We present Combined Uncertainty estimation for Reliable Embodied planning (CURE), which decomposes the uncertainty into epistemic and intrinsic uncertainty, each estimated separately.
arXiv Detail & Related papers (2025-10-09T10:26:58Z)
Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation [20.726685669562496]
Hallucinations are a common issue that undermine the reliability of large language models (LLMs)<n>Recent studies have identified a subset of hallucinations, known as confabulations, which arise due to predictive uncertainty of LLMs.<n>To detect confabulations, various methods for estimating predictive uncertainty in natural language generation (NLG) have been developed.
arXiv Detail & Related papers (2025-10-02T17:54:09Z)
Why Machine Learning Models Fail to Fully Capture Epistemic Uncertainty [3.4970971805884474]
We make use of a more fine-grained taxonomy of epistemic uncertainty sources in machine learning models.<n>We show that high model bias can lead to misleadingly low estimates of epistemic uncertainty.<n>Common second-order uncertainty methods systematically blur bias-induced errors into aleatoric estimates.
arXiv Detail & Related papers (2025-05-29T14:50:46Z)
Token-Level Uncertainty Estimation for Large Language Model Reasoning [24.56760223952017]
Large Language Models (LLMs) have demonstrated impressive capabilities, but their output quality remains inconsistent across various application scenarios.<n>We propose a token-level uncertainty estimation framework to enable LLMs to self-assess and self-improve their generation quality in mathematical reasoning.
arXiv Detail & Related papers (2025-05-16T22:47:32Z)
Probabilistic Modeling of Disparity Uncertainty for Robust and Efficient Stereo Matching [61.73532883992135]
We propose a new uncertainty-aware stereo matching framework.<n>We adopt Bayes risk as the measurement of uncertainty and use it to separately estimate data and model uncertainty.
arXiv Detail & Related papers (2024-12-24T23:28:20Z)
MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty [10.154013836043816]
We investigate previous uncertainty quantification methods under the presence of data uncertainty.<n>Our findings show that previous methods relatively struggle compared to single-answer settings.<n>We observe that entropy- and consistency-based methods effectively estimate model uncertainty, even in the presence of data uncertainty.
arXiv Detail & Related papers (2024-08-13T11:17:31Z)
Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors. We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z)
One step closer to unbiased aleatoric uncertainty estimation [71.55174353766289]
We propose a new estimation method by actively de-noising the observed data. By conducting a broad range of experiments, we demonstrate that our proposed approach provides a much closer approximation to the actual data uncertainty than the standard method.
arXiv Detail & Related papers (2023-12-16T14:59:11Z)
A Deeper Look into Aleatoric and Epistemic Uncertainty Disentanglement [7.6146285961466]
In this paper, we generalize methods to produce disentangled uncertainties to work with different uncertainty quantification methods. We show that there is an interaction between learning aleatoric and epistemic uncertainty, which is unexpected and violates assumptions on aleatoric uncertainty. We expect that our formulation and results help practitioners and researchers choose uncertainty methods and expand the use of disentangled uncertainties.
arXiv Detail & Related papers (2022-04-20T08:41:37Z)
Dense Uncertainty Estimation via an Ensemble-based Conditional Latent Variable Model [68.34559610536614]
We argue that the aleatoric uncertainty is an inherent attribute of the data and can only be correctly estimated with an unbiased oracle model. We propose a new sampling and selection strategy at train time to approximate the oracle model for aleatoric uncertainty estimation. Our results show that our solution achieves both accurate deterministic results and reliable uncertainty estimation.
arXiv Detail & Related papers (2021-11-22T08:54:10Z)
DEUP: Direct Epistemic Uncertainty Prediction [56.087230230128185]
Epistemic uncertainty is part of out-of-sample prediction error due to the lack of knowledge of the learner. We propose a principled approach for directly estimating epistemic uncertainty by learning to predict generalization error and subtracting an estimate of aleatoric uncertainty.
arXiv Detail & Related papers (2021-02-16T23:50:35Z)
Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions [121.10450359856242]
We develop a frequentist procedure that utilizes influence functions of a model's loss functional to construct a jackknife (or leave-one-out) estimator of predictive confidence intervals. The DJ satisfies (1) and (2), is applicable to a wide range of deep learning models, is easy to implement, and can be applied in a post-hoc fashion without interfering with model training or compromising its accuracy.
arXiv Detail & Related papers (2020-06-29T13:36:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.