Related papers: DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction

DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction

URL: http://arxiv.org/abs/2412.09572v1
Date: Thu, 12 Dec 2024 18:52:40 GMT
Title: DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction
Authors: Yu Feng, Phu Mon Htut, Zheng Qi, Wei Xiao, Manuel Mager, Nikolaos Pappas, Kishaloy Halder, Yang Li, Yassine Benajiba, Dan Roth,
Abstract summary: Existing methods, which gauge a model's uncertainty through evaluating self-consistency in responses to the original query, do not always capture true uncertainty.<n>We propose a novel method, DiverseAgentEntropy, for evaluating a model's uncertainty using multi-agent interaction.<n>Our method offers a more accurate prediction of the model's reliability and further detects hallucinations, outperforming other self-consistency-based methods.
Score: 53.803276766404494
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quantifying the uncertainty in the factual parametric knowledge of Large Language Models (LLMs), especially in a black-box setting, poses a significant challenge. Existing methods, which gauge a model's uncertainty through evaluating self-consistency in responses to the original query, do not always capture true uncertainty. Models might respond consistently to the origin query with a wrong answer, yet respond correctly to varied questions from different perspectives about the same query, and vice versa. In this paper, we propose a novel method, DiverseAgentEntropy, for evaluating a model's uncertainty using multi-agent interaction under the assumption that if a model is certain, it should consistently recall the answer to the original query across a diverse collection of questions about the same original query. We further implement an abstention policy to withhold responses when uncertainty is high. Our method offers a more accurate prediction of the model's reliability and further detects hallucinations, outperforming other self-consistency-based methods. Additionally, it demonstrates that existing models often fail to consistently retrieve the correct answer to the same query under diverse varied questions even when knowing the correct answer.

Related papers

Variability Need Not Imply Error: The Case of Adequate but Semantically Distinct Responses [7.581259361859477]
Uncertainty quantification tools can be used to reject a response when the model is uncertain' We estimate the Probability the model assigns to Adequate Responses (PROBAR) We find PROBAR to outperform semantic entropy across prompts with varying degrees of ambiguity/open-endedness.
arXiv Detail & Related papers (2024-12-20T09:02:26Z)
Testing Uncertainty of Large Language Models for Physics Knowledge and Reasoning [0.0]
Large Language Models (LLMs) have gained significant popularity in recent years for their ability to answer questions in various fields. We introduce an analysis for evaluating the performance of popular open-source LLMs. We focus on the relationship between answer accuracy and variability in topics related to physics.
arXiv Detail & Related papers (2024-11-18T13:42:13Z)
Uncertainty Estimation of Large Language Models in Medical Question Answering [60.72223137560633]
Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information. We benchmark popular uncertainty estimation (UE) methods with different model sizes on medical question-answering datasets. Our results show that current approaches generally perform poorly in this domain, highlighting the challenge of UE for medical applications.
arXiv Detail & Related papers (2024-07-11T16:51:33Z)
Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries [6.249216559519607]
We estimate the uncertainty of closed-source large language models via multiple rephrasings of an original base query. Our method demonstrates significant improvements in the calibration of uncertainty estimates compared to the baseline.
arXiv Detail & Related papers (2024-05-22T18:28:26Z)
Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs. Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z)
Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination" We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z)
Realistic Conversational Question Answering with Answer Selection based on Calibrated Confidence and Uncertainty Measurement [54.55643652781891]
Conversational Question Answering (ConvQA) models aim at answering a question with its relevant paragraph and previous question-answer pairs that occurred during conversation multiple times. We propose to filter out inaccurate answers in the conversation history based on their estimated confidences and uncertainties from the ConvQA model. We validate our models, Answer Selection-based realistic Conversation Question Answering, on two standard ConvQA datasets.
arXiv Detail & Related papers (2023-02-10T09:42:07Z)
Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization [73.04187954213471]
We introduce a unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval. The proposed method has achieved +4.03%, +3.38%, and +2.40% Recall@50 accuracy over a strong baseline.
arXiv Detail & Related papers (2022-11-14T14:25:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.