UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models
- URL: http://arxiv.org/abs/2505.19060v1
- Date: Sun, 25 May 2025 09:30:43 GMT
- Title: UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models
- Authors: Roman Vashurin, Maiya Goloburda, Preslav Nakov, Maxim Panov,
- Abstract summary: We show that UNCERTAINTY-LINE: consistently improves over even nominally length-normalized UQ methods uncertainty estimates.<n>Our method is post-hoc, model-agnostic, and applicable to a range of UQ measures.
- Score: 34.52549605613087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have become indispensable tools across various applications, making it more important than ever to ensure the quality and the trustworthiness of their outputs. This has led to growing interest in uncertainty quantification (UQ) methods for assessing the reliability of LLM outputs. Many existing UQ techniques rely on token probabilities, which inadvertently introduces a bias with respect to the length of the output. While some methods attempt to account for this, we demonstrate that such biases persist even in length-normalized approaches. To address the problem, here we propose UNCERTAINTY-LINE: (Length-INvariant Estimation), a simple debiasing procedure that regresses uncertainty scores on output length and uses the residuals as corrected, length-invariant estimates. Our method is post-hoc, model-agnostic, and applicable to a range of UQ measures. Through extensive evaluation on machine translation, summarization, and question-answering tasks, we demonstrate that UNCERTAINTY-LINE: consistently improves over even nominally length-normalized UQ methods uncertainty estimates across multiple metrics and models.
Related papers
- UNCLE: Uncertainty Expressions in Long-Form Generation [48.7696074873262]
Large Language Models (LLMs) are prone to hallucination, particularly in long-form generations.<n>We introduce UNCLE, a benchmark designed to evaluate uncertainty expression in both long- and short-form question answering (QA)<n>Our dataset is the first to directly bridge short- and long-form QA with paired questions and gold-standard answers.
arXiv Detail & Related papers (2025-05-22T17:16:08Z) - Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results [10.551985027162576]
We show that commonly used correctness functions bias UQ evaluations by inflating the performance of certain UQ methods.<n>We evaluate 7 correctness functions -- from lexical-based and embedding-based metrics to LLM-as-a-judge approaches.
arXiv Detail & Related papers (2025-04-18T13:13:42Z) - Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models [76.17975723711886]
Uncertainty quantification (UQ) is a prominent approach for eliciting truthful answers from large language models (LLMs)<n>In this work, we adapt Mahalanobis Distance (MD) - a well-established UQ technique in classification tasks - for text generation.<n>Our method extracts token embeddings from multiple layers of LLMs, computes MD scores for each token, and uses linear regression trained on these features to provide robust uncertainty scores.
arXiv Detail & Related papers (2025-02-20T10:25:13Z) - Uncertainty Quantification for LLMs through Minimum Bayes Risk: Bridging Confidence and Consistency [66.9354890840418]
Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompass a variety of approaches.<n>We propose a novel approach to integrating model confidence with output consistency, resulting in a family of efficient and robust UQ methods.<n>We evaluate our approach across various tasks such as question answering, abstractive summarization, and machine translation.
arXiv Detail & Related papers (2025-02-07T14:30:12Z) - Legitimate ground-truth-free metrics for deep uncertainty classification scoring [3.9599054392856483]
The use of Uncertainty Quantification (UQ) methods in production remains limited.<n>This limitation is exacerbated by the challenge of validating UQ methods in absence of UQ ground truth.<n>This paper investigates such metrics and proves that they are theoretically well-behaved and actually tied to some uncertainty ground truth.
arXiv Detail & Related papers (2024-10-30T14:14:32Z) - Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models [96.43562963756975]
We train a regression model, which target variable is the gap between the conditional and the unconditional generation confidence.
We use this learned conditional dependency model to modulate the uncertainty of the current generation step based on the uncertainty of the previous step.
arXiv Detail & Related papers (2024-08-20T09:42:26Z) - Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications.<n>We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.<n>We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - SPUQ: Perturbation-Based Uncertainty Quantification for Large Language
Models [9.817185255633758]
Large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities.
A pressing challenge is their tendency to make confidently wrong predictions.
We introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties.
Our findings show a substantial improvement in model calibration, with a reduction in Expected Error (ECE) by 50% on average.
arXiv Detail & Related papers (2024-03-04T21:55:22Z) - Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs.
Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z) - Towards Clear Expectations for Uncertainty Estimation [64.20262246029286]
Uncertainty Quantification (UQ) is crucial to achieve trustworthy Machine Learning (ML)
Most UQ methods suffer from disparate and inconsistent evaluation protocols.
This opinion paper offers a new perspective by specifying those requirements through five downstream tasks.
arXiv Detail & Related papers (2022-07-27T07:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.