VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation
- URL: http://arxiv.org/abs/2411.11919v1
- Date: Mon, 18 Nov 2024 04:06:04 GMT
- Title: VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation
- Authors: Ruiyang Zhang, Hu Zhang, Zhedong Zheng,
- Abstract summary: We introduce VL-Uncertainty, the first uncertainty-based framework for detecting hallucinations in large vision-language models.
We measure uncertainty by analyzing the prediction variance across semantically equivalent but perturbed prompts.
When LVLMs are highly confident, they provide consistent responses to semantically equivalent queries.
But, when uncertain, the responses of the target LVLM become more random.
- Score: 18.873512856021357
- License:
- Abstract: Given the higher information load processed by large vision-language models (LVLMs) compared to single-modal LLMs, detecting LVLM hallucinations requires more human and time expense, and thus rise a wider safety concerns. In this paper, we introduce VL-Uncertainty, the first uncertainty-based framework for detecting hallucinations in LVLMs. Different from most existing methods that require ground-truth or pseudo annotations, VL-Uncertainty utilizes uncertainty as an intrinsic metric. We measure uncertainty by analyzing the prediction variance across semantically equivalent but perturbed prompts, including visual and textual data. When LVLMs are highly confident, they provide consistent responses to semantically equivalent queries. However, when uncertain, the responses of the target LVLM become more random. Considering semantically similar answers with different wordings, we cluster LVLM responses based on their semantic content and then calculate the cluster distribution entropy as the uncertainty measure to detect hallucination. Our extensive experiments on 10 LVLMs across four benchmarks, covering both free-form and multi-choice tasks, show that VL-Uncertainty significantly outperforms strong baseline methods in hallucination detection.
Related papers
- Reference-free Hallucination Detection for Large Vision-Language Models [19.36348897433261]
Large vision-language models (LVLMs) have made significant progress in recent years.
LVLMs exhibit excellent ability in language understanding, question answering, and conversations of visual inputs.
They are prone to producing hallucinations.
Several methods are proposed to evaluate the hallucinations in LVLMs, most are reference-based and depend on external tools.
arXiv Detail & Related papers (2024-08-11T13:17:14Z) - LLM Internal States Reveal Hallucination Risk Faced With a Query [62.29558761326031]
Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries.
This paper investigates whether Large Language Models can estimate their own hallucination risk before response generation.
By a probing estimator, we leverage LLM self-assessment, achieving an average hallucination estimation accuracy of 84.32% at run time.
arXiv Detail & Related papers (2024-07-03T17:08:52Z) - Semantically Diverse Language Generation for Uncertainty Estimation in Language Models [5.8034373350518775]
Large language models (LLMs) can suffer from hallucinations when generating text.
Current LLMs generate text in an autoregressive fashion by predicting and appending text tokens.
We introduce Semantically Diverse Language Generation to quantify predictive uncertainty in LLMs.
arXiv Detail & Related papers (2024-06-06T17:53:34Z) - To Believe or Not to Believe Your LLM [51.2579827761899]
We explore uncertainty quantification in large language models (LLMs)
We derive an information-theoretic metric that allows to reliably detect when only epistemic uncertainty is large.
We conduct a series of experiments which demonstrate the advantage of our formulation.
arXiv Detail & Related papers (2024-06-04T17:58:18Z) - Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important.
We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z) - Semantic Density: Uncertainty Quantification for Large Language Models through Confidence Measurement in Semantic Space [14.715989394285238]
Existing Large Language Models (LLMs) do not have an inherent functionality to provide the users with an uncertainty/confidence metric for each response it generates.
A new framework is proposed in this paper to address these issues.
Semantic density extracts uncertainty/confidence information for each response from a probability distribution perspective in semantic space.
arXiv Detail & Related papers (2024-05-22T17:13:49Z) - Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations [63.330182403615886]
A major barrier towards the practical deployment of large language models (LLMs) is their lack of reliability.
Three situations where this is particularly apparent are correctness, hallucinations when given unanswerable questions, and safety.
In all three cases, models should ideally abstain from responding, much like humans, whose ability to understand uncertainty makes us refrain from answering questions we don't know.
arXiv Detail & Related papers (2024-04-16T23:56:38Z) - Uncertainty Quantification for In-Context Learning of Large Language Models [52.891205009620364]
In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs)
We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties.
The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion.
arXiv Detail & Related papers (2024-02-15T18:46:24Z) - INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection [39.52923659121416]
We propose to explore the dense semantic information retained within textbfINternal textbfStates for halluctextbfInation textbfDEtection.
A simple yet effective textbfEigenScore metric is proposed to better evaluate responses' self-consistency.
A test time feature clipping approach is explored to truncate extreme activations in the internal states.
arXiv Detail & Related papers (2024-02-06T06:23:12Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.