Related papers: Rethinking Uncertainty Estimation in Natural Language Generation

Rethinking Uncertainty Estimation in Natural Language Generation

URL: http://arxiv.org/abs/2412.15176v1
Date: Thu, 19 Dec 2024 18:51:06 GMT
Title: Rethinking Uncertainty Estimation in Natural Language Generation
Authors: Lukas Aichberger, Kajetan Schweighofer, Sepp Hochreiter,
Abstract summary: Large Language Models (LLMs) are increasingly employed in real-world applications.<n>Uncertainty estimation methods generate and analyze multiple output sequences to determine the LLM's uncertainty.<n>We propose G-NLL, which has the advantage of being obtained using only a single output sequence.
Score: 6.3398383724486544
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly employed in real-world applications, driving the need to evaluate the trustworthiness of their generated text. To this end, reliable uncertainty estimation is essential. Since current LLMs generate text autoregressively through a stochastic process, the same prompt can lead to varying outputs. Consequently, leading uncertainty estimation methods generate and analyze multiple output sequences to determine the LLM's uncertainty. However, generating output sequences is computationally expensive, making these methods impractical at scale. In this work, we inspect the theoretical foundations of the leading methods and explore new directions to enhance their computational efficiency. Building on the framework of proper scoring rules, we find that the negative log-likelihood of the most likely output sequence constitutes a theoretically grounded uncertainty measure. To approximate this alternative measure, we propose G-NLL, which has the advantage of being obtained using only a single output sequence generated by greedy decoding. This makes uncertainty estimation more efficient and straightforward, while preserving theoretical rigor. Empirical results demonstrate that G-NLL achieves state-of-the-art performance across various LLMs and tasks. Our work lays the foundation for efficient and reliable uncertainty estimation in natural language generation, challenging the necessity of more computationally involved methods currently leading the field.

Related papers

Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models [5.6672926445919165]
Large language models (LLMs) have transformed natural language processing, but their reliable deployment requires effective uncertainty quantification (UQ)<n>Existing UQ methods are often and lack a probabilistic foundation.<n>We propose a fully probabilistic framework based on an inverse model, which quantifies uncertainty by evaluating the diversity of the input space conditioned on a given output through systematic perturbations.
arXiv Detail & Related papers (2025-06-11T13:02:17Z)
Token-Level Uncertainty Estimation for Large Language Model Reasoning [24.56760223952017]
Large Language Models (LLMs) have demonstrated impressive capabilities, but their output quality remains inconsistent across various application scenarios.<n>We propose a token-level uncertainty estimation framework to enable LLMs to self-assess and self-improve their generation quality in mathematical reasoning.
arXiv Detail & Related papers (2025-05-16T22:47:32Z)
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey [11.737403011836532]
Large Language Models (LLMs) excel in text generation, reasoning, and decision-making in high-stakes domains such as healthcare, law, and transportation. Uncertainty quantification (UQ) enhances trustworthiness by estimating confidence in outputs, enabling risk mitigation and selective prediction. We introduce a new taxonomy that categorizes UQ methods based on computational efficiency and uncertainty dimensions.
arXiv Detail & Related papers (2025-03-20T05:04:29Z)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models. We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models. Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation [0.0]
We explore uncertainty estimation as a proxy for correctness in LLM-generated code. We adapt two state-of-the-art techniques from natural language generation to the domain of code generation. Our findings indicate a strong correlation between the uncertainty computed through these techniques and correctness.
arXiv Detail & Related papers (2025-02-17T10:03:01Z)
ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees [68.33498595506941]
We introduce a novel uncertainty measure based on self-consistency theory. We then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm. Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods.
arXiv Detail & Related papers (2024-06-29T17:33:07Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important. We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z)
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification [116.77055746066375]
Large language models (LLMs) are notorious for hallucinating, i.e., producing erroneous claims in their output. We propose a novel fact-checking and hallucination detection pipeline based on token-level uncertainty quantification.
arXiv Detail & Related papers (2024-03-07T17:44:17Z)
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling. Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z)
A Learning-Based Optimal Uncertainty Quantification Method and Its Application to Ballistic Impact Problems [1.713291434132985]
This paper concerns the optimal (supremum and infimum) uncertainty bounds for systems where the input (or prior) measure is only partially/imperfectly known. We demonstrate the learning based framework on the uncertainty optimization problem. We show that the approach can be used to construct maps for the performance certificate and safety in engineering practice.
arXiv Detail & Related papers (2022-12-28T14:30:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.