AdaIAT: Adaptively Increasing Attention to Generated Text to Alleviate Hallucinations in LVLM
- URL: http://arxiv.org/abs/2603.04908v1
- Date: Thu, 05 Mar 2026 07:52:11 GMT
- Title: AdaIAT: Adaptively Increasing Attention to Generated Text to Alleviate Hallucinations in LVLM
- Authors: Li'an Zhong, Ziqiang He, Jibin Zheng, Jin Li, Z. Jane Wang, Xiangui Kang,
- Abstract summary: We propose Attention to Generated Text (IAT) to alleviate hallucinations while avoiding repetitive descriptions.<n>AdaIAT employs a layer-wise threshold to control intervention time and fine-grained amplification magnitude tailored to the characteristics of each attention head.
- Score: 28.68190828797428
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hallucination has been a significant impediment to the development and application of current Large Vision-Language Models (LVLMs). To mitigate hallucinations, one intuitive and effective way is to directly increase attention weights to image tokens during inference. Although this effectively reduces the hallucination rate, it often induces repetitive descriptions. To address this, we first conduct an analysis of attention patterns and reveal that real object tokens tend to assign higher attention to the generated text than hallucinated ones. This inspires us to leverage the generated text, which contains instruction-related visual information and contextual knowledge, to alleviate hallucinations while maintaining linguistic coherence. We therefore propose Attention to Generated Text (IAT) and demonstrate that it significantly reduces the hallucination rate while avoiding repetitive descriptions. To prevent naive amplification from impairing the inherent prediction capabilities of LVLMs, we further explore Adaptive IAT (AdaIAT) that employs a layer-wise threshold to control intervention time and fine-grained amplification magnitude tailored to the characteristics of each attention head. Both analysis and experiments demonstrate the effectiveness of AdaIAT. Results of several LVLMs show that AdaIAT effectively alleviates hallucination (reducing hallucination rates $C_S$ and $C_I$ on LLaVA-1.5 by 35.8% and 37.1%, respectively) while preserving linguistic performance and prediction capability, achieving an attractive trade-off.
Related papers
- Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs [26.144870818163387]
We propose a framework that models hallucination process via a structural causal graph.<n>We introduce VTACR, a novel metric that quantifies the modality contribution imbalance during decoding.<n>We design a fine-language attention intervention mechanism that dynamically adjusts token- and layer-wise attention.
arXiv Detail & Related papers (2025-11-12T06:13:26Z) - Exposing Hallucinations To Suppress Them: VLMs Representation Editing With Generative Anchors [8.089908150148554]
Multimodal large language models (MLLMs) have achieved remarkable success across diverse vision-language tasks.<n>MLLMs are highly susceptible to hallucinations, producing content that is fluent but inconsistent with visual evidence.<n>We propose a training-free, self-supervised method for hallucination mitigation.
arXiv Detail & Related papers (2025-09-26T07:24:28Z) - CAI: Caption-Sensitive Attention Intervention for Mitigating Object Hallucination in Large Vision-Language Models [60.0300765815417]
Large Vision-Language Models (LVLMs) frequently produce content that deviates from visual information, leading to object hallucination.<n>We propose Caption-sensitive Attention Intervention (CAI), a training-free, plug-and-play hallucination mitigation method.
arXiv Detail & Related papers (2025-06-30T07:52:36Z) - Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression [16.465703269782654]
Large vision language models (LVLMs) often suffer from hallucinations, generating texts misaligned with the visual context.<n>Existing methods aimed at reducing hallucinations through inference time intervention incur a significant increase in latency.<n>We present SPIN, a task-agnostic attention-guided head suppression strategy that can be seamlessly integrated during inference.
arXiv Detail & Related papers (2025-05-22T09:00:57Z) - Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation [123.54980913741828]
Large Vision-Language Models (LVLMs) remain vulnerable to hallucinations.<n>We propose a novel, training-free strategy namely Attention HIjackers Detection and Disentanglement (AID)<n>AID identifies Attention Hijackers by calculating instruction-driven visual salience.<n>Next, Attention Disentanglement mechanism is proposed to mask the visual attention of these identified Hijackers.<n>Re-Disentanglement recalculates the balance between instruction-driven and image-driven visual salience to avoid over-masking effects.
arXiv Detail & Related papers (2025-03-11T09:35:55Z) - Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence [69.86946427928511]
We investigate the internal mechanisms driving hallucination in large vision-language models (LVLMs)<n>We introduce Vision-aware Head Divergence (VHD), a metric that quantifies the sensitivity of attention head outputs to visual context.<n>We propose Vision-aware Head Reinforcement (VHR), a training-free approach to mitigate hallucination by enhancing the role of vision-aware attention heads.
arXiv Detail & Related papers (2024-12-18T15:29:30Z) - Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs [52.497823009176074]
Large Vision-Language Models (LVLMs) often produce responses that misalign with factual information, a phenomenon known as hallucinations.<n>We introduce Visual Description Grounded Decoding (VDGD), a training-free method designed to enhance visual perception and improve reasoning capabilities in LVLMs.
arXiv Detail & Related papers (2024-05-24T16:21:59Z) - Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization [123.54980913741828]
Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data.<n>They invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images.<n>Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information.<n>However, they struggle to precisely induce the hallucinatory tokens, which severely limits their effectiveness in mitigating hallucinations.
arXiv Detail & Related papers (2024-05-24T08:46:31Z) - Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback [40.930238150365795]
We propose detecting and mitigating hallucinations in Large Vision Language Models (LVLMs) via fine-grained AI feedback.<n>We generate a small-size hallucination annotation dataset by proprietary models.<n>Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model.
arXiv Detail & Related papers (2024-04-22T14:46:10Z) - Hallucination Augmented Contrastive Learning for Multimodal Large
Language Model [53.65682783591723]
Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks.
However, MLLMs still face a fundamental limitation of hallucinations, where they tend to generate erroneous or fabricated information.
In this paper, we address hallucinations in MLLMs from a novel perspective of representation learning.
arXiv Detail & Related papers (2023-12-12T04:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.