Related papers: Towards Clear Expectations for Uncertainty Estimation

Towards Clear Expectations for Uncertainty Estimation

URL: http://arxiv.org/abs/2207.13341v1
Date: Wed, 27 Jul 2022 07:50:57 GMT
Title: Towards Clear Expectations for Uncertainty Estimation
Authors: Victor Bouvier, Simona Maggio, Alexandre Abraham, L\'eo Dreyfus-Schmidt
Abstract summary: Uncertainty Quantification (UQ) is crucial to achieve trustworthy Machine Learning (ML) Most UQ methods suffer from disparate and inconsistent evaluation protocols. This opinion paper offers a new perspective by specifying those requirements through five downstream tasks.
Score: 64.20262246029286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: If Uncertainty Quantification (UQ) is crucial to achieve trustworthy Machine Learning (ML), most UQ methods suffer from disparate and inconsistent evaluation protocols. We claim this inconsistency results from the unclear requirements the community expects from UQ. This opinion paper offers a new perspective by specifying those requirements through five downstream tasks where we expect uncertainty scores to have substantial predictive power. We design these downstream tasks carefully to reflect real-life usage of ML models. On an example benchmark of 7 classification datasets, we did not observe statistical superiority of state-of-the-art intrinsic UQ methods against simple baselines. We believe that our findings question the very rationale of why we quantify uncertainty and call for a standardized protocol for UQ evaluation based on metrics proven to be relevant for the ML practitioner.

Related papers

Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results [10.551985027162576]
We show that commonly used correctness functions bias UQ evaluations by inflating the performance of certain UQ methods. We evaluate 7 correctness functions -- from lexical-based and embedding-based metrics to LLM-as-a-judge approaches.
arXiv Detail & Related papers (2025-04-18T13:13:42Z)
CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought [10.166370877826486]
Large language models (LLMs) excel in many tasks but struggle to accurately quantify uncertainty in their generated responses. Existing uncertainty quantification (UQ) methods for LLMs are primarily prompt-wise rather than response-wise, which incurs high computational costs. We propose CoT-UQ, a response-wise UQ framework that integrates LLMs' inherent reasoning capabilities through Chain-of-Thought (CoT) into the UQ process.
arXiv Detail & Related papers (2025-02-24T14:48:06Z)
Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models [76.17975723711886]
Uncertainty quantification (UQ) is a prominent approach for eliciting truthful answers from large language models (LLMs) In this work, we adapt Mahalanobis Distance (MD) - a well-established UQ technique in classification tasks - for text generation. Our method extracts token embeddings from multiple layers of LLMs, computes MD scores for each token, and uses linear regression trained on these features to provide robust uncertainty scores.
arXiv Detail & Related papers (2025-02-20T10:25:13Z)
Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI [47.64301863399763]
We present a dynamic semantic clustering approach inspired by the Chinese Restaurant Process. We quantify uncertainty of Large Language Models (LLMs) on a given query by calculating entropy of the generated semantic clusters. We propose leveraging the (negative) likelihood of these clusters as the (non)conformity score within Conformal Prediction framework.
arXiv Detail & Related papers (2024-11-04T18:49:46Z)
Legitimate ground-truth-free metrics for deep uncertainty classification scoring [3.9599054392856483]
The use of Uncertainty Quantification (UQ) methods in production remains limited. This limitation is exacerbated by the challenge of validating UQ methods in absence of UQ ground truth. This paper investigates such metrics and proves that they are theoretically well-behaved and actually tied to some uncertainty ground truth.
arXiv Detail & Related papers (2024-10-30T14:14:32Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification (UQ) is a critical component of machine learning (ML) applications. We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines. We conduct a large-scale empirical investigation of UQ and normalization techniques across nine tasks, and identify the most promising approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a QPP framework using automatically generated relevance judgments (QPP-GenRE) QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query. This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels.
arXiv Detail & Related papers (2024-04-01T09:33:05Z)
Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond [52.246494389096654]
This paper introduces Word-Sequence Entropy (WSE), a method that calibrates uncertainty at both the word and sequence levels. We compare WSE with six baseline methods on five free-form medical QA datasets, utilizing seven popular large language models (LLMs)
arXiv Detail & Related papers (2024-02-22T03:46:08Z)
Distribution-free uncertainty quantification for classification under label shift [105.27463615756733]
We focus on uncertainty quantification (UQ) for classification problems via two avenues. We first argue that label shift hurts UQ, by showing degradation in coverage and calibration. We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.
arXiv Detail & Related papers (2021-03-04T20:51:03Z)
Uncertainty Quantification Using Neural Networks for Molecular Property Prediction [33.34534208450156]
We systematically evaluate several methods on five benchmark datasets using multiple complementary performance metrics. None of the methods we tested is unequivocally superior to all others, and none produces a particularly reliable ranking of errors across multiple datasets. We conclude with a practical recommendation as to which existing techniques seem to perform well relative to others.
arXiv Detail & Related papers (2020-05-20T13:31:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.