Related papers: MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty

MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty

URL: http://arxiv.org/abs/2408.06816v1
Date: Tue, 13 Aug 2024 11:17:31 GMT
Title: MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty
Authors: Yongjin Yang, Haneul Yoo, Hwaran Lee,
Abstract summary: We propose a new Multi-Answer Question Answering dataset, MAQA, consisting of world knowledge, mathematical reasoning, and commonsense reasoning tasks. Our findings show that entropy and consistency-based methods estimate the model uncertainty well even under data uncertainty. We believe our observations will pave the way for future work on uncertainty quantification in realistic setting.
Score: 10.154013836043816
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Although large language models (LLMs) are capable of performing various tasks, they still suffer from producing plausible but incorrect responses. To improve the reliability of LLMs, recent research has focused on uncertainty quantification to predict whether a response is correct or not. However, most uncertainty quantification methods have been evaluated on questions requiring a single clear answer, ignoring the existence of data uncertainty that arises from irreducible randomness. Instead, these methods only consider model uncertainty, which arises from a lack of knowledge. In this paper, we investigate previous uncertainty quantification methods under the presence of data uncertainty. Our contributions are two-fold: 1) proposing a new Multi-Answer Question Answering dataset, MAQA, consisting of world knowledge, mathematical reasoning, and commonsense reasoning tasks to evaluate uncertainty quantification regarding data uncertainty, and 2) assessing 5 uncertainty quantification methods of diverse white- and black-box LLMs. Our findings show that entropy and consistency-based methods estimate the model uncertainty well even under data uncertainty, while other methods for white- and black-box LLMs struggle depending on the tasks. Additionally, methods designed for white-box LLMs suffer from overconfidence in reasoning tasks compared to simple knowledge queries. We believe our observations will pave the way for future work on uncertainty quantification in realistic setting.

Related papers

The Role of Model Confidence on Bias Effects in Measured Uncertainties [11.314633260055436]
We find that mitigating prompt-introduced bias improves uncertainty quantification in Visual Question Answering (VQA) tasks.<n>We find that all considered biases induce greater changes in both uncertainties when bias-free model confidence is lower.<n>These distinct effects deepen our understanding of bias mitigation for uncertainty quantification and potentially inform the development of more advanced techniques.
arXiv Detail & Related papers (2025-06-20T03:43:10Z)
Token-Level Uncertainty Estimation for Large Language Model Reasoning [24.56760223952017]
Large Language Models (LLMs) have demonstrated impressive capabilities, but their output quality remains inconsistent across various application scenarios.<n>We propose a token-level uncertainty estimation framework to enable LLMs to self-assess and self-improve their generation quality in mathematical reasoning.
arXiv Detail & Related papers (2025-05-16T22:47:32Z)
A Survey of Uncertainty Estimation Methods on Large Language Models [12.268958536971782]
Large language models (LLMs) have demonstrated remarkable capabilities across various tasks. These models could offer biased, hallucinated, or non-factual responses camouflaged by their fluency and realistic appearance. Uncertainty estimation is the key method to address this challenge.
arXiv Detail & Related papers (2025-02-28T20:38:39Z)
Estimating LLM Uncertainty with Evidence [66.51144261657983]
We present Logits-induced token uncertainty (LogTokU) as a framework for estimating decoupled token uncertainty in Large Language Models.<n>We employ evidence modeling to implement LogTokU and use the estimated uncertainty to guide downstream tasks.
arXiv Detail & Related papers (2025-02-01T03:18:02Z)
Probabilistic Modeling of Disparity Uncertainty for Robust and Efficient Stereo Matching [61.73532883992135]
We propose a new uncertainty-aware stereo matching framework. We adopt Bayes risk as the measurement of uncertainty and use it to separately estimate data and model uncertainty.
arXiv Detail & Related papers (2024-12-24T23:28:20Z)
Question Rephrasing for Quantifying Uncertainty in Large Language Models: Applications in Molecular Chemistry Tasks [4.167519875804914]
We present a novel Question Rephrasing technique to evaluate the input uncertainty of large language models (LLMs) This technique is integrated with sampling methods that measure the output uncertainty of LLMs, thereby offering a more comprehensive uncertainty assessment.
arXiv Detail & Related papers (2024-08-07T12:38:23Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
A Structured Review of Literature on Uncertainty in Machine Learning & Deep Learning [0.8667724053232616]
We focus on a critical concern for adaptation of Machine Learning in risk-sensitive applications, namely understanding and quantifying uncertainty. Our paper approaches this topic in a structured way, providing a review of the literature in the various facets that uncertainty is enveloped in the ML process. Key contributions in this review are broadening the scope of uncertainty discussion, as well as an updated review of uncertainty quantification methods in Deep Learning.
arXiv Detail & Related papers (2024-06-01T07:17:38Z)
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important. We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z)
Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach [6.209293868095268]
We study the problem of uncertainty estimation and calibration for LLMs. We propose a supervised approach that leverages labeled datasets to estimate the uncertainty in LLMs' responses. Our method is easy to implement and adaptable to different levels of model accessibility including black box, grey box, and white box.
arXiv Detail & Related papers (2024-04-24T17:10:35Z)
Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations [63.330182403615886]
A major barrier towards the practical deployment of large language models (LLMs) is their lack of reliability. Three situations where this is particularly apparent are correctness, hallucinations when given unanswerable questions, and safety. In all three cases, models should ideally abstain from responding, much like humans, whose ability to understand uncertainty makes us refrain from answering questions we don't know.
arXiv Detail & Related papers (2024-04-16T23:56:38Z)
Uncertainty Quantification for In-Context Learning of Large Language Models [52.891205009620364]
In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs) We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties. The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion.
arXiv Detail & Related papers (2024-02-15T18:46:24Z)
One step closer to unbiased aleatoric uncertainty estimation [71.55174353766289]
We propose a new estimation method by actively de-noising the observed data. By conducting a broad range of experiments, we demonstrate that our proposed approach provides a much closer approximation to the actual data uncertainty than the standard method.
arXiv Detail & Related papers (2023-12-16T14:59:11Z)
Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge [35.067234242461545]
Large language models (LLMs) express uncertainty in situations where they lack sufficient parametric knowledge to generate reasonable responses. This work aims to systematically investigate LLMs' behaviors in such situations, emphasizing the trade-off between honesty and helpfulness.
arXiv Detail & Related papers (2023-11-16T10:02:40Z)
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling. Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z)
Quantifying Uncertainty in Natural Language Explanations of Large Language Models [29.34960984639281]
Large Language Models (LLMs) are increasingly used as powerful tools for high-stakes natural language processing (NLP) applications. We propose two novel metrics -- $textitVerbalized Uncertainty$ and $textitProbing Uncertainty$ -- to quantify the uncertainty of generated explanations. Our empirical analysis of benchmark datasets reveals that verbalized uncertainty is not a reliable estimate of explanation confidence.
arXiv Detail & Related papers (2023-11-06T21:14:40Z)
Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination" We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z)
Dense Uncertainty Estimation via an Ensemble-based Conditional Latent Variable Model [68.34559610536614]
We argue that the aleatoric uncertainty is an inherent attribute of the data and can only be correctly estimated with an unbiased oracle model. We propose a new sampling and selection strategy at train time to approximate the oracle model for aleatoric uncertainty estimation. Our results show that our solution achieves both accurate deterministic results and reliable uncertainty estimation.
arXiv Detail & Related papers (2021-11-22T08:54:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.