Related papers: HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

URL: http://arxiv.org/abs/2506.21546v2
Date: Sat, 28 Jun 2025 15:32:51 GMT
Title: HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation
Authors: Xinzhuo Li, Adheesh Juvekar, Xingyou Liu, Muntasir Wahed, Kiet A. Nguyen, Ismini Lourentzou,
Abstract summary: HalluSegBench is the first benchmark specifically designed to evaluate hallucinations in visual grounding through the lens of counterfactual visual reasoning.<n>Our benchmark consists of a novel dataset of 1340 counterfactual instance pairs spanning 281 unique object classes.<n> Experiments on HalluSegBench with state-of-the-art vision-language segmentation models reveal that vision-driven hallucinations are significantly more prevalent than label-driven ones.
Score: 2.2006360539727923
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent progress in vision-language segmentation has significantly advanced grounded visual understanding. However, these models often exhibit hallucinations by producing segmentation masks for objects not grounded in the image content or by incorrectly labeling irrelevant regions. Existing evaluation protocols for segmentation hallucination primarily focus on label or textual hallucinations without manipulating the visual context, limiting their capacity to diagnose critical failures. In response, we introduce HalluSegBench, the first benchmark specifically designed to evaluate hallucinations in visual grounding through the lens of counterfactual visual reasoning. Our benchmark consists of a novel dataset of 1340 counterfactual instance pairs spanning 281 unique object classes, and a set of newly introduced metrics that quantify hallucination sensitivity under visually coherent scene edits. Experiments on HalluSegBench with state-of-the-art vision-language segmentation models reveal that vision-driven hallucinations are significantly more prevalent than label-driven ones, with models often persisting in false segmentation, highlighting the need for counterfactual reasoning to diagnose grounding fidelity.

Related papers

A Survey of Multimodal Hallucination Evaluation and Detection [52.03164192840023]
Multi-modal Large Language Models (MLLMs) have emerged as a powerful paradigm for integrating visual and textual information.<n>These models often suffer from hallucination, producing content that appears plausible but contradicts the input content or established world knowledge.<n>This survey offers an in-depth review of hallucination evaluation benchmarks and detection methods across Image-to-Text (I2T) and Text-to-image (T2I) generation tasks.
arXiv Detail & Related papers (2025-07-25T07:22:42Z)
Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images [6.48620624181578]
We introduce SHE (Sequence Hallucination Eradication), a lightweight framework that detects hallucinations and mitigates them.<n>We also propose a new metric (BEACH) to quantify behavioral hallucination severity.
arXiv Detail & Related papers (2025-06-08T15:08:52Z)
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding [72.15848305976706]
Large Multimodal Models (LMMs) have achieved impressive progress in visual perception and reasoning.<n>When confronted with visually ambiguous or non-semantic scene text, they often struggle to accurately spot and understand the content.<n>We propose a training-free semantic hallucination mitigation framework comprising two key components.
arXiv Detail & Related papers (2025-06-05T19:53:19Z)
Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression [6.838584336878126]
Large vision language models (LVLMs) often suffer from hallucinations, generating texts misaligned with the visual context.<n>Existing methods aimed at reducing hallucinations through inference time intervention incur a significant increase in latency.<n>We present SPIN, a task-agnostic attention-guided head suppression strategy that can be seamlessly integrated during inference.
arXiv Detail & Related papers (2025-05-22T09:00:57Z)
HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination"<n>This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z)
HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning [5.130890556960832]
This work delves into the intricacies of hallucinatory phenomena exhibited by widely used image captioners, unraveling interesting patterns.<n>The deterministic and efficient nature of the employed conceptual counterfactuals backbone is able to suggest semantically minimal edits.<n>Our proposed hallucination detection framework is highly interpretable, by providing semantically meaningful edits apart from standalone numbers.
arXiv Detail & Related papers (2025-03-01T10:28:19Z)
Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models [57.58426038241812]
Large Vision-Language Models (LVLMs) have demonstrated remarkable performance in complex multimodal tasks.<n>These models still suffer from hallucinations when required to implicitly recognize or infer diverse visual entities from images.<n>We propose a novel visual question answering (VQA) benchmark that employs contextual reasoning prompts as hallucination attacks.
arXiv Detail & Related papers (2024-12-29T23:56:01Z)
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models [91.78328878860003]
Large vision-language models (LVLMs) are prone to hallucinations. benchmarks often rely on hand-crafted corner cases whose failure patterns may not generalize well. We develop AutoHallusion, the first automated benchmark generation approach.
arXiv Detail & Related papers (2024-06-16T11:44:43Z)
ESREAL: Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models [6.014286500397164]
Hallucinations in vision-language models pose a significant challenge to their reliability, particularly in the generation of long captions. We introduce ESREAL, a novel unsupervised learning framework designed to suppress the generation of hallucinations through accurate localization and penalization of hallucinated tokens. Our framework notably reduces hallucinations in LLaVA, InstructBLIP, and mPLUG-Owl2 by 32.81%, 27.08%, and 7.46% on the CHAIR metric.
arXiv Detail & Related papers (2024-03-24T14:21:06Z)
Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models [57.42800112251644]
We focus on a specific type of hallucination-number hallucination, referring to models incorrectly identifying the number of certain objects in pictures. We devise a training approach aimed at improving consistency to reduce number hallucinations, which leads to an 8% enhancement in performance over direct finetuning methods.
arXiv Detail & Related papers (2024-03-03T02:31:11Z)
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training [66.0036211069513]
Large-scale vision-language pre-trained models are prone to hallucinate non-existent visual objects when generating text. We show that models achieving better scores on standard metrics could hallucinate objects more frequently. Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination.
arXiv Detail & Related papers (2022-10-14T10:27:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.