Related papers: Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey

Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey

URL: http://arxiv.org/abs/2503.15850v1
Date: Thu, 20 Mar 2025 05:04:29 GMT
Title: Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey
Authors: Xiaoou Liu, Tiejin Chen, Longchao Da, Chacha Chen, Zhen Lin, Hua Wei,
Abstract summary: Large Language Models (LLMs) excel in text generation, reasoning, and decision-making in high-stakes domains such as healthcare, law, and transportation.<n>Uncertainty quantification (UQ) enhances trustworthiness by estimating confidence in outputs, enabling risk mitigation and selective prediction.<n>We introduce a new taxonomy that categorizes UQ methods based on computational efficiency and uncertainty dimensions.
Score: 11.737403011836532
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) excel in text generation, reasoning, and decision-making, enabling their adoption in high-stakes domains such as healthcare, law, and transportation. However, their reliability is a major concern, as they often produce plausible but incorrect responses. Uncertainty quantification (UQ) enhances trustworthiness by estimating confidence in outputs, enabling risk mitigation and selective prediction. However, traditional UQ methods struggle with LLMs due to computational constraints and decoding inconsistencies. Moreover, LLMs introduce unique uncertainty sources, such as input ambiguity, reasoning path divergence, and decoding stochasticity, that extend beyond classical aleatoric and epistemic uncertainty. To address this, we introduce a new taxonomy that categorizes UQ methods based on computational efficiency and uncertainty dimensions (input, reasoning, parameter, and prediction uncertainty). We evaluate existing techniques, assess their real-world applicability, and identify open challenges, emphasizing the need for scalable, interpretable, and robust UQ approaches to enhance LLM reliability.

Related papers

Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling [48.15636223774418]
Large language models (LLMs) frequently hallucinate due to misaligned self-awareness. Existing approaches mitigate hallucinations via uncertainty estimation or query rejection. We propose the Explicit Knowledge Boundary Modeling framework to integrate fast and slow reasoning systems.
arXiv Detail & Related papers (2025-03-04T03:16:02Z)
Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation [0.0]
We explore uncertainty estimation as a proxy for correctness in LLM-generated code.<n>We adapt two state-of-the-art techniques from natural language generation to the domain of code generation.<n>Our findings indicate a strong correlation between the uncertainty computed through these techniques and correctness.
arXiv Detail & Related papers (2025-02-17T10:03:01Z)
Label-Confidence-Aware Uncertainty Estimation in Natural Language Generation [8.635811152610604]
Uncertainty Quantification (UQ) is crucial for ensuring the safety and robustness of AI systems.<n>We propose a label-confidence-aware (LCA) uncertainty estimation based on Kullback-Leibler divergence bridging between samples and label source.
arXiv Detail & Related papers (2024-12-10T07:35:23Z)
Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning [10.457661605916435]
Large language models (LLMs) have revolutionized the field of natural language processing with their impressive reasoning and question-answering capabilities.<n>LLMs are sometimes prone to generating credible-sounding but incorrect information, a phenomenon known as hallucinations.<n>We introduce a novel uncertainty-aware causal language modeling loss function, grounded in the principles of decision theory.
arXiv Detail & Related papers (2024-12-03T23:14:47Z)
ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees [68.33498595506941]
We introduce a novel uncertainty measure based on self-consistency theory. We then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm. Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods.
arXiv Detail & Related papers (2024-06-29T17:33:07Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important. We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z)
SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models [9.817185255633758]
Large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. A pressing challenge is their tendency to make confidently wrong predictions. We introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties. Our findings show a substantial improvement in model calibration, with a reduction in Expected Error (ECE) by 50% on average.
arXiv Detail & Related papers (2024-03-04T21:55:22Z)
Uncertainty Quantification for Forward and Inverse Problems of PDEs via Latent Global Evolution [110.99891169486366]
We propose a method that integrates efficient and precise uncertainty quantification into a deep learning-based surrogate model. Our method endows deep learning-based surrogate models with robust and efficient uncertainty quantification capabilities for both forward and inverse problems. Our method excels at propagating uncertainty over extended auto-regressive rollouts, making it suitable for scenarios involving long-term predictions.
arXiv Detail & Related papers (2024-02-13T11:22:59Z)
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling. Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.