Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images
- URL: http://arxiv.org/abs/2506.07184v1
- Date: Sun, 08 Jun 2025 15:08:52 GMT
- Title: Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images
- Authors: Liangliang You, Junchi Yao, Shu Yang, Guimin Hu, Lijie Hu, Di Wang,
- Abstract summary: We introduce SHE (Sequence Hallucination Eradication), a lightweight framework that detects hallucinations and mitigates them.<n>We also propose a new metric (BEACH) to quantify behavioral hallucination severity.
- Score: 6.48620624181578
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While multimodal large language models excel at various tasks, they still suffer from hallucinations, which limit their reliability and scalability for broader domain applications. To address this issue, recent research mainly focuses on objective hallucination. However, for sequential images, besides objective hallucination, there is also behavioral hallucination, which is less studied. This work aims to fill in the gap. We first reveal that behavioral hallucinations mainly arise from two key factors: prior-driven bias and the snowball effect. Based on these observations, we introduce SHE (Sequence Hallucination Eradication), a lightweight, two-stage framework that (1) detects hallucinations via visual-textual alignment check using our proposed adaptive temporal window and (2) mitigates them via orthogonal projection onto the joint embedding space. We also propose a new metric (BEACH) to quantify behavioral hallucination severity. Empirical results on standard benchmarks demonstrate that SHE reduces behavioral hallucination by over 10% on BEACH while maintaining descriptive accuracy.
Related papers
- MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models [73.20126092411776]
We conduct the first systematic study of hallucinations in multi-image MLLMs.<n>We propose MIHBench, a benchmark specifically tailored for evaluating object-related hallucinations across multiple images.<n>MIHBench comprises three core tasks: Multi-Image Object Existence Hallucination, Multi-Image Object Count Hallucination, and Object Identity Consistency Hallucination.
arXiv Detail & Related papers (2025-08-01T15:49:29Z) - Beyond Facts: Evaluating Intent Hallucination in Large Language Models [13.315302240710164]
FAITHQA is a novel benchmark for intent hallucination that contains 20,068 problems.<n>We find that intent hallucination is a common issue even for state-of-the-art models.<n>We introduce an automatic LLM generation evaluation metric, CONSTRAINT SCORE, for detecting intent hallucination.
arXiv Detail & Related papers (2025-06-06T21:10:55Z) - Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression [6.838584336878126]
Large vision language models (LVLMs) often suffer from hallucinations, generating texts misaligned with the visual context.<n>Existing methods aimed at reducing hallucinations through inference time intervention incur a significant increase in latency.<n>We present SPIN, a task-agnostic attention-guided head suppression strategy that can be seamlessly integrated during inference.
arXiv Detail & Related papers (2025-05-22T09:00:57Z) - Valuable Hallucinations: Realizable Non-realistic Propositions [2.451326684641447]
This paper introduces the first formal definition of valuable hallucinations in large language models (LLMs)<n>We focus on the potential value that certain types of hallucinations can offer in specific contexts.<n>We present experiments using the Qwen2.5 model and HalluQA dataset, employing ReAct prompting to control and optimize hallucinations.
arXiv Detail & Related papers (2025-02-16T12:59:11Z) - A Unified Hallucination Mitigation Framework for Large Vision-Language Models [18.595958586621943]
We present a unified framework, Dentist, for hallucination mitigation.
The core step is to first classify the queries, then perform different processes of hallucination mitigation based on the classification result.
On MMbench, we achieve a 13.44%/10.2%/15.8% improvement in accuracy on Image Quality.
arXiv Detail & Related papers (2024-09-24T22:36:58Z) - Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs [54.50483041708911]
Hallu-PI is the first benchmark designed to evaluate hallucination in MLLMs within Perturbed Inputs.
Hallu-PI consists of seven perturbed scenarios, containing 1,260 perturbed images from 11 object types.
Our research reveals a severe bias in MLLMs' ability to handle different types of hallucinations.
arXiv Detail & Related papers (2024-08-02T16:07:15Z) - ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models [65.12177400764506]
Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications.<n>Current hallucination detection and mitigation datasets are limited in domains and sizes.<n>This paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset.
arXiv Detail & Related papers (2024-07-05T17:56:38Z) - Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback [40.930238150365795]
We propose detecting and mitigating hallucinations in Large Vision Language Models (LVLMs) via fine-grained AI feedback.<n>We generate a small-size hallucination annotation dataset by proprietary models.<n>Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model.
arXiv Detail & Related papers (2024-04-22T14:46:10Z) - On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination.
We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics.
Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z) - Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models [57.42800112251644]
We focus on a specific type of hallucination-number hallucination, referring to models incorrectly identifying the number of certain objects in pictures.
We devise a training approach aimed at improving consistency to reduce number hallucinations, which leads to an 8% enhancement in performance over direct finetuning methods.
arXiv Detail & Related papers (2024-03-03T02:31:11Z) - Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations.
We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms.
We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z) - On Early Detection of Hallucinations in Factual Question Answering [4.76359068115052]
hallucinations remain a major impediment towards gaining user trust.
In this work, we explore if the artifacts associated with the model generations can provide hints that the generation will contain hallucinations.
Our results show that the distributions of these artifacts tend to differ between hallucinated and non-hallucinated generations.
arXiv Detail & Related papers (2023-12-19T14:35:04Z) - Plausible May Not Be Faithful: Probing Object Hallucination in
Vision-Language Pre-training [66.0036211069513]
Large-scale vision-language pre-trained models are prone to hallucinate non-existent visual objects when generating text.
We show that models achieving better scores on standard metrics could hallucinate objects more frequently.
Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination.
arXiv Detail & Related papers (2022-10-14T10:27:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.