MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty
- URL: http://arxiv.org/abs/2408.06816v1
- Date: Tue, 13 Aug 2024 11:17:31 GMT
- Title: MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty
- Authors: Yongjin Yang, Haneul Yoo, Hwaran Lee,
- Abstract summary: We propose a new Multi-Answer Question Answering dataset, MAQA, consisting of world knowledge, mathematical reasoning, and commonsense reasoning tasks.
Our findings show that entropy and consistency-based methods estimate the model uncertainty well even under data uncertainty.
We believe our observations will pave the way for future work on uncertainty quantification in realistic setting.
- Score: 10.154013836043816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although large language models (LLMs) are capable of performing various tasks, they still suffer from producing plausible but incorrect responses. To improve the reliability of LLMs, recent research has focused on uncertainty quantification to predict whether a response is correct or not. However, most uncertainty quantification methods have been evaluated on questions requiring a single clear answer, ignoring the existence of data uncertainty that arises from irreducible randomness. Instead, these methods only consider model uncertainty, which arises from a lack of knowledge. In this paper, we investigate previous uncertainty quantification methods under the presence of data uncertainty. Our contributions are two-fold: 1) proposing a new Multi-Answer Question Answering dataset, MAQA, consisting of world knowledge, mathematical reasoning, and commonsense reasoning tasks to evaluate uncertainty quantification regarding data uncertainty, and 2) assessing 5 uncertainty quantification methods of diverse white- and black-box LLMs. Our findings show that entropy and consistency-based methods estimate the model uncertainty well even under data uncertainty, while other methods for white- and black-box LLMs struggle depending on the tasks. Additionally, methods designed for white-box LLMs suffer from overconfidence in reasoning tasks compared to simple knowledge queries. We believe our observations will pave the way for future work on uncertainty quantification in realistic setting.
Related papers
- Question Rephrasing for Quantifying Uncertainty in Large Language Models: Applications in Molecular Chemistry Tasks [4.167519875804914]
We present a novel Question Rephrasing technique to evaluate the input uncertainty of large language models (LLMs)
This technique is integrated with sampling methods that measure the output uncertainty of LLMs, thereby offering a more comprehensive uncertainty assessment.
arXiv Detail & Related papers (2024-08-07T12:38:23Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important.
We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z) - Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach [6.209293868095268]
We study the problem of uncertainty estimation and calibration for LLMs.
We propose a supervised approach that leverages labeled datasets to estimate the uncertainty in LLMs' responses.
Our method is easy to implement and adaptable to different levels of model accessibility including black box, grey box, and white box.
arXiv Detail & Related papers (2024-04-24T17:10:35Z) - Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations [63.330182403615886]
A major barrier towards the practical deployment of large language models (LLMs) is their lack of reliability.
Three situations where this is particularly apparent are correctness, hallucinations when given unanswerable questions, and safety.
In all three cases, models should ideally abstain from responding, much like humans, whose ability to understand uncertainty makes us refrain from answering questions we don't know.
arXiv Detail & Related papers (2024-04-16T23:56:38Z) - Uncertainty Quantification for In-Context Learning of Large Language Models [52.891205009620364]
In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs)
We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties.
The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion.
arXiv Detail & Related papers (2024-02-15T18:46:24Z) - Examining LLMs' Uncertainty Expression Towards Questions Outside
Parametric Knowledge [35.067234242461545]
Large language models (LLMs) express uncertainty in situations where they lack sufficient parametric knowledge to generate reasonable responses.
This work aims to systematically investigate LLMs' behaviors in such situations, emphasizing the trade-off between honesty and helpfulness.
arXiv Detail & Related papers (2023-11-16T10:02:40Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - Quantifying Uncertainty in Natural Language Explanations of Large
Language Models [29.34960984639281]
Large Language Models (LLMs) are increasingly used as powerful tools for high-stakes natural language processing (NLP) applications.
We propose two novel metrics -- $textitVerbalized Uncertainty$ and $textitProbing Uncertainty$ -- to quantify the uncertainty of generated explanations.
Our empirical analysis of benchmark datasets reveals that verbalized uncertainty is not a reliable estimate of explanation confidence.
arXiv Detail & Related papers (2023-11-06T21:14:40Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.