Towards Clear Expectations for Uncertainty Estimation
- URL: http://arxiv.org/abs/2207.13341v1
- Date: Wed, 27 Jul 2022 07:50:57 GMT
- Title: Towards Clear Expectations for Uncertainty Estimation
- Authors: Victor Bouvier, Simona Maggio, Alexandre Abraham, L\'eo
Dreyfus-Schmidt
- Abstract summary: Uncertainty Quantification (UQ) is crucial to achieve trustworthy Machine Learning (ML)
Most UQ methods suffer from disparate and inconsistent evaluation protocols.
This opinion paper offers a new perspective by specifying those requirements through five downstream tasks.
- Score: 64.20262246029286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: If Uncertainty Quantification (UQ) is crucial to achieve trustworthy Machine
Learning (ML), most UQ methods suffer from disparate and inconsistent
evaluation protocols. We claim this inconsistency results from the unclear
requirements the community expects from UQ. This opinion paper offers a new
perspective by specifying those requirements through five downstream tasks
where we expect uncertainty scores to have substantial predictive power. We
design these downstream tasks carefully to reflect real-life usage of ML
models. On an example benchmark of 7 classification datasets, we did not
observe statistical superiority of state-of-the-art intrinsic UQ methods
against simple baselines. We believe that our findings question the very
rationale of why we quantify uncertainty and call for a standardized protocol
for UQ evaluation based on metrics proven to be relevant for the ML
practitioner.
Related papers
- Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [85.51252685938564]
Uncertainty quantification (UQ) is becoming increasingly recognized as a critical component of applications that rely on machine learning (ML)
As with other ML models, large language models (LLMs) are prone to make incorrect predictions, hallucinate'' by fabricating claims, or simply generate low-quality output for a given input.
We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines, and provides an environment for controllable and consistent evaluation of novel techniques.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a QPP framework using automatically generated relevance judgments (QPP-GenRE)
QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query.
This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels.
arXiv Detail & Related papers (2024-04-01T09:33:05Z) - Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form
Medical Question Answering Applications and Beyond [63.969531254692725]
Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems.
We propose the Word-Sequence Entropy (WSE), which calibrates the uncertainty proportion at both the word and sequence levels according to semantic relevance.
We show that WSE exhibits superior performance on accurate uncertainty measurement under two standard criteria for correctness evaluation.
arXiv Detail & Related papers (2024-02-22T03:46:08Z) - Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs.
Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z) - Comparing the quality of neural network uncertainty estimates for
classification problems [0.0]
Uncertainty quantification (UQ) methods for deep learning (DL) models have received increased attention in the literature.
We use statistical methods of frequentist interval coverage and interval width to evaluate the quality of credible intervals.
We apply these different UQ for DL methods to a hyperspectral image target detection problem and show the inconsistency of the different methods' results.
arXiv Detail & Related papers (2023-08-11T01:55:14Z) - Distribution-free uncertainty quantification for classification under
label shift [105.27463615756733]
We focus on uncertainty quantification (UQ) for classification problems via two avenues.
We first argue that label shift hurts UQ, by showing degradation in coverage and calibration.
We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.
arXiv Detail & Related papers (2021-03-04T20:51:03Z) - Uncertainty Quantification Using Neural Networks for Molecular Property
Prediction [33.34534208450156]
We systematically evaluate several methods on five benchmark datasets using multiple complementary performance metrics.
None of the methods we tested is unequivocally superior to all others, and none produces a particularly reliable ranking of errors across multiple datasets.
We conclude with a practical recommendation as to which existing techniques seem to perform well relative to others.
arXiv Detail & Related papers (2020-05-20T13:31:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.