Related papers: Estimating LLM Uncertainty with Evidence

Estimating LLM Uncertainty with Evidence

URL: http://arxiv.org/abs/2502.00290v5
Date: Fri, 09 May 2025 05:37:54 GMT
Title: Estimating LLM Uncertainty with Evidence
Authors: Huan Ma, Jingdong Chen, Joey Tianyi Zhou, Guangyu Wang, Changqing Zhang,
Abstract summary: We present Logits-induced token uncertainty (LogTokU) as a framework for estimating decoupled token uncertainty in Large Language Models.<n>We employ evidence modeling to implement LogTokU and use the estimated uncertainty to guide downstream tasks.
Score: 66.51144261657983
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Over the past few years, Large Language Models (LLMs) have developed rapidly and are widely applied in various domains. However, LLMs face the issue of hallucinations, generating responses that may be unreliable when the models lack relevant knowledge. To be aware of potential hallucinations, uncertainty estimation methods have been introduced, and most of them have confirmed that reliability lies in critical tokens. However, probability-based methods perform poorly in identifying token reliability, limiting their practical utility. In this paper, we reveal that the probability-based method fails to estimate token reliability due to the loss of evidence strength information which is accumulated in the training stage. Therefore, we present Logits-induced token uncertainty (LogTokU), a framework for estimating decoupled token uncertainty in LLMs, enabling real-time uncertainty estimation without requiring multiple sampling processes. We employ evidence modeling to implement LogTokU and use the estimated uncertainty to guide downstream tasks. The experimental results demonstrate that LogTokU has significant effectiveness and promise.

Related papers

Towards Reliable LLM-based Robot Planning via Combined Uncertainty Estimation [68.106428321492]
Large language models (LLMs) demonstrate advanced reasoning abilities, enabling robots to understand natural language instructions and generate high-level plans with appropriate grounding.<n>LLMs hallucinations present a significant challenge, often leading to overconfident yet potentially misaligned or unsafe plans.<n>We present Combined Uncertainty estimation for Reliable Embodied planning (CURE), which decomposes the uncertainty into epistemic and intrinsic uncertainty, each estimated separately.
arXiv Detail & Related papers (2025-10-09T10:26:58Z)
Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models [24.72990207218907]
Large Language Models (LLMs) are prone to generating fluent but incorrect content, known as confabulation.<n>We investigate how in-context information influences model behavior and whether LLMs can identify their unreliable responses.
arXiv Detail & Related papers (2025-08-11T16:12:36Z)
Token-Level Uncertainty Estimation for Large Language Model Reasoning [24.56760223952017]
Large Language Models (LLMs) have demonstrated impressive capabilities, but their output quality remains inconsistent across various application scenarios.<n>We propose a token-level uncertainty estimation framework to enable LLMs to self-assess and self-improve their generation quality in mathematical reasoning.
arXiv Detail & Related papers (2025-05-16T22:47:32Z)
Efficient Uncertainty Estimation via Distillation of Bayesian Large Language Models [12.69571386421462]
In this paper, we investigate the possibility of eliminating the need for test-time sampling for uncertainty estimation.<n>We distill an off-the-shelf Bayesian LLM into a non-Bayesian student LLM by minimizing the divergence between their predictive distributions.<n>Our experiments demonstrate that uncertainty estimation capabilities on training data can successfully generalize to unseen test data.
arXiv Detail & Related papers (2025-05-16T22:26:03Z)
A Survey of Uncertainty Estimation Methods on Large Language Models [12.268958536971782]
Large language models (LLMs) have demonstrated remarkable capabilities across various tasks. These models could offer biased, hallucinated, or non-factual responses camouflaged by their fluency and realistic appearance. Uncertainty estimation is the key method to address this challenge.
arXiv Detail & Related papers (2025-02-28T20:38:39Z)
Probabilistic Modeling of Disparity Uncertainty for Robust and Efficient Stereo Matching [61.73532883992135]
We propose a new uncertainty-aware stereo matching framework.<n>We adopt Bayes risk as the measurement of uncertainty and use it to separately estimate data and model uncertainty.
arXiv Detail & Related papers (2024-12-24T23:28:20Z)
On Verbalized Confidence Scores for LLMs [25.160810008907397]
Uncertainty quantification for large language models (LLMs) can establish more human trust into their responses. This work focuses on asking the LLM itself to verbalize its uncertainty with a confidence score as part of its output tokens. We assess the reliability of verbalized confidence scores with respect to different datasets, models, and prompt methods.
arXiv Detail & Related papers (2024-12-19T11:10:36Z)
Do LLMs estimate uncertainty well in instruction-following? [9.081508933326644]
Large language models (LLMs) could be valuable personal AI agents across various domains, provided they can precisely follow user instructions. We present the first systematic evaluation of the uncertainty estimation abilities of LLMs in the context of instruction-following. Our findings show that existing uncertainty methods struggle, particularly when models make subtle errors in instruction following.
arXiv Detail & Related papers (2024-10-18T16:32:10Z)
Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models [96.43562963756975]
We train a regression model, which target variable is the gap between the conditional and the unconditional generation confidence. We use this learned conditional dependency model to modulate the uncertainty of the current generation step based on the uncertainty of the previous step.
arXiv Detail & Related papers (2024-08-20T09:42:26Z)
MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty [10.154013836043816]
We investigate previous uncertainty quantification methods under the presence of data uncertainty. Our findings show that previous methods relatively struggle compared to single-answer settings. We observe that entropy- and consistency-based methods effectively estimate model uncertainty, even in the presence of data uncertainty.
arXiv Detail & Related papers (2024-08-13T11:17:31Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach [6.209293868095268]
We study the problem of uncertainty estimation and calibration for LLMs. We propose a supervised approach that leverages labeled datasets to estimate the uncertainty in LLMs' responses. Our method is easy to implement and adaptable to different levels of model accessibility including black box, grey box, and white box.
arXiv Detail & Related papers (2024-04-24T17:10:35Z)
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification [116.77055746066375]
Large language models (LLMs) are notorious for hallucinating, i.e., producing erroneous claims in their output. We propose a novel fact-checking and hallucination detection pipeline based on token-level uncertainty quantification.
arXiv Detail & Related papers (2024-03-07T17:44:17Z)
Uncertainty Quantification for In-Context Learning of Large Language Models [52.891205009620364]
In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs) We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties. The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion.
arXiv Detail & Related papers (2024-02-15T18:46:24Z)
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling. Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z)
Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models [27.491408293411734]
Large Language Models (LLMs) show promising results in language generation and instruction following but frequently "hallucinate" Our research introduces a simple redundancy: not all tokens in auto-regressive text equally represent the underlying meaning.
arXiv Detail & Related papers (2023-07-03T22:17:16Z)
Uncertainty Estimation by Fisher Information-based Evidential Deep Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL) In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.