Related papers: Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

URL: http://arxiv.org/abs/2602.05535v1
Date: Thu, 05 Feb 2026 10:51:39 GMT
Title: Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification
Authors: Tao Huang, Rui Wang, Xiaofei Liu, Yi Qin, Li Duan, Liping Jing,
Abstract summary: Large vision-language models (LVLMs) have shown substantial advances in multimodal understanding and generation.<n>They frequently produce unreliable or even harmful content, such as fact hallucinations or dangerous instructions.<n>Evidential Uncertainty Quantification (EUQ) captures both information conflict and ignorance for effective detection of LVLM misbehaviors.
Score: 27.02252748004729
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large vision-language models (LVLMs) have shown substantial advances in multimodal understanding and generation. However, when presented with incompetent or adversarial inputs, they frequently produce unreliable or even harmful content, such as fact hallucinations or dangerous instructions. This misalignment with human expectations, referred to as \emph{misbehaviors} of LVLMs, raises serious concerns for deployment in critical applications. These misbehaviors are found to stem from epistemic uncertainty, specifically either conflicting internal knowledge or the absence of supporting information. However, existing uncertainty quantification methods, which typically capture only overall epistemic uncertainty, have shown limited effectiveness in identifying such issues. To address this gap, we propose Evidential Uncertainty Quantification (EUQ), a fine-grained method that captures both information conflict and ignorance for effective detection of LVLM misbehaviors. In particular, we interpret features from the model output head as either supporting (positive) or opposing (negative) evidence. Leveraging Evidence Theory, we model and aggregate this evidence to quantify internal conflict and knowledge gaps within a single forward pass. We extensively evaluate our method across four categories of misbehavior, including hallucinations, jailbreaks, adversarial vulnerabilities, and out-of-distribution (OOD) failures, using state-of-the-art LVLMs, and find that EUQ consistently outperforms strong baselines, showing that hallucinations correspond to high internal conflict and OOD failures to high ignorance. Furthermore, layer-wise evidential uncertainty dynamics analysis helps interpret the evolution of internal representations from a new perspective. The source code is available at https://github.com/HT86159/EUQ.

Related papers

FaithSCAN: Model-Driven Single-Pass Hallucination Detection for Faithful Visual Question Answering [14.550872089352943]
FaithSCAN is a lightweight network that detects hallucinations by exploiting rich internal signals of vision-language models.<n>We extend the LLM-as-a-Judge paradigm to VQA hallucination and propose a low-cost strategy to automatically generate model-dependent supervision signals.<n>In-depth analysis shows hallucinations arise from systematic internal state variations in visual perception, cross-modal reasoning, and language decoding.
arXiv Detail & Related papers (2026-01-01T09:19:39Z)
HaluNet: Multi-Granular Uncertainty Modeling for Efficient Hallucination Detection in LLM Question Answering [12.183015986299438]
We present textbfHaluNet, a lightweight and trainable neural framework that integrates multi granular token level uncertainties.<n> Experiments on SQuAD, TriviaQA, and Natural Questions show that HaluNet delivers strong detection performance and favorable computational efficiency.
arXiv Detail & Related papers (2025-12-31T02:03:10Z)
HACK: Hallucinations Along Certainty and Knowledge Axes [66.66625343090743]
We propose a framework for categorizing hallucinations along two axes: knowledge and certainty.<n>We identify a particularly concerning subset of hallucinations where models hallucinate with certainty despite having the correct knowledge internally.
arXiv Detail & Related papers (2025-10-28T09:34:31Z)
Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs [72.8370367403852]
Vision-Language Models (VLMs) achieve strong results on multimodal tasks such as visual question answering, yet they can still fail even when the correct visual evidence is present.<n>We show that shallow layers focus primarily on text, while deeper layers sparsely but reliably attend to localized evidence regions.<n>We introduce an inference-time intervention that highlights deep-layer evidence regions through selective attention-based masking.
arXiv Detail & Related papers (2025-10-20T17:31:09Z)
Can Multiple Responses from an LLM Reveal the Sources of Its Uncertainty? [11.309445539128733]
Large language models (LLMs) have delivered significant breakthroughs across diverse domains but can still produce unreliable or misleading outputs.<n>We show that, when an LLM is uncertain, the patterns of disagreement among its multiple generated responses contain rich clues about the underlying cause of uncertainty.
arXiv Detail & Related papers (2025-08-28T20:14:35Z)
Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs [13.982395477368396]
Large language models (LLMs) have demonstrated remarkable performance across diverse tasks by encoding vast amounts of factual knowledge.<n>They are still prone to hallucinations, generating incorrect or misleading information, often accompanied by high uncertainty.<n>We introduce Semantic Volume, a novel measure for quantifying both external and internal uncertainty in LLMs.
arXiv Detail & Related papers (2025-02-28T17:09:08Z)
SegSub: Evaluating Robustness to Knowledge Conflicts and Hallucinations in Vision-Language Models [6.52323086990482]
Vision language models (VLM) demonstrate sophisticated multimodal reasoning yet are prone to hallucination when confronted with knowledge conflicts.<n>This research introduces segsub, a framework for applying targeted image perturbations to investigate VLM resilience against knowledge conflicts.
arXiv Detail & Related papers (2025-02-19T00:26:38Z)
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning [151.4060202671114]
multimodal large language models (MLLMs) have shown unprecedented capabilities in advancing vision-language tasks.<n>This paper introduces a novel bottom-up reasoning framework to address hallucinations in MLLMs.<n>Our framework systematically addresses potential issues in both visual and textual inputs by verifying and integrating perception-level information with cognition-level commonsense knowledge.
arXiv Detail & Related papers (2024-12-15T09:10:46Z)
Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations [63.330182403615886]
A major barrier towards the practical deployment of large language models (LLMs) is their lack of reliability. Three situations where this is particularly apparent are correctness, hallucinations when given unanswerable questions, and safety. In all three cases, models should ideally abstain from responding, much like humans, whose ability to understand uncertainty makes us refrain from answering questions we don't know.
arXiv Detail & Related papers (2024-04-16T23:56:38Z)
Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge. We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z)
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis [127.85293480405082]
The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges. Existing alignment methods usually direct LLMs toward the favorable outcomes by utilizing human-annotated, flawless instruction-response pairs. This study proposes a novel alignment technique based on mistake analysis, which deliberately exposes LLMs to erroneous content to learn the reasons for mistakes and how to avoid them.
arXiv Detail & Related papers (2023-10-16T14:59:10Z)
Towards Mitigating Hallucination in Large Language Models via Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.