Towards Clear Expectations for Uncertainty Estimation
- URL: http://arxiv.org/abs/2207.13341v1
- Date: Wed, 27 Jul 2022 07:50:57 GMT
- Title: Towards Clear Expectations for Uncertainty Estimation
- Authors: Victor Bouvier, Simona Maggio, Alexandre Abraham, L\'eo
Dreyfus-Schmidt
- Abstract summary: Uncertainty Quantification (UQ) is crucial to achieve trustworthy Machine Learning (ML)
Most UQ methods suffer from disparate and inconsistent evaluation protocols.
This opinion paper offers a new perspective by specifying those requirements through five downstream tasks.
- Score: 64.20262246029286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: If Uncertainty Quantification (UQ) is crucial to achieve trustworthy Machine
Learning (ML), most UQ methods suffer from disparate and inconsistent
evaluation protocols. We claim this inconsistency results from the unclear
requirements the community expects from UQ. This opinion paper offers a new
perspective by specifying those requirements through five downstream tasks
where we expect uncertainty scores to have substantial predictive power. We
design these downstream tasks carefully to reflect real-life usage of ML
models. On an example benchmark of 7 classification datasets, we did not
observe statistical superiority of state-of-the-art intrinsic UQ methods
against simple baselines. We believe that our findings question the very
rationale of why we quantify uncertainty and call for a standardized protocol
for UQ evaluation based on metrics proven to be relevant for the ML
practitioner.
Related papers
- Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI [47.64301863399763]
We present a dynamic semantic clustering approach inspired by the Chinese Restaurant Process.
We quantify uncertainty of Large Language Models (LLMs) on a given query by calculating entropy of the generated semantic clusters.
We propose leveraging the (negative) likelihood of these clusters as the (non)conformity score within Conformal Prediction framework.
arXiv Detail & Related papers (2024-11-04T18:49:46Z) - Legitimate ground-truth-free metrics for deep uncertainty classification scoring [3.9599054392856483]
The use of Uncertainty Quantification (UQ) methods in production remains limited.
This limitation is exacerbated by the challenge of validating UQ methods in absence of UQ ground truth.
This paper investigates such metrics and proves that they are theoretically well-behaved and actually tied to some uncertainty ground truth.
arXiv Detail & Related papers (2024-10-30T14:14:32Z) - Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification (UQ) is a critical component of machine learning (ML) applications.
We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.
We conduct a large-scale empirical investigation of UQ and normalization techniques across nine tasks, and identify the most promising approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a QPP framework using automatically generated relevance judgments (QPP-GenRE)
QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query.
This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels.
arXiv Detail & Related papers (2024-04-01T09:33:05Z) - Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond [52.246494389096654]
This paper introduces Word-Sequence Entropy (WSE), a method that calibrates uncertainty at both the word and sequence levels.
We compare WSE with six baseline methods on five free-form medical QA datasets, utilizing seven popular large language models (LLMs)
arXiv Detail & Related papers (2024-02-22T03:46:08Z) - Distribution-free uncertainty quantification for classification under
label shift [105.27463615756733]
We focus on uncertainty quantification (UQ) for classification problems via two avenues.
We first argue that label shift hurts UQ, by showing degradation in coverage and calibration.
We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.
arXiv Detail & Related papers (2021-03-04T20:51:03Z) - Uncertainty Quantification Using Neural Networks for Molecular Property
Prediction [33.34534208450156]
We systematically evaluate several methods on five benchmark datasets using multiple complementary performance metrics.
None of the methods we tested is unequivocally superior to all others, and none produces a particularly reliable ranking of errors across multiple datasets.
We conclude with a practical recommendation as to which existing techniques seem to perform well relative to others.
arXiv Detail & Related papers (2020-05-20T13:31:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.