PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning
- URL: http://arxiv.org/abs/2510.19183v1
- Date: Wed, 22 Oct 2025 02:41:07 GMT
- Title: PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning
- Authors: Fengyuan Sun, Hui Chen, Xinhao Xu, Dandan Zheng, Jingdong Chen, Jun Zhou, Jungong Han, Guiguang Ding,
- Abstract summary: hallucinations in large language models (MLLMs) are strongly associated with insufficient attention allocated to visual tokens.<n>We propose textbfPruneHal, a training-free, simple yet effective method that leverages adaptive KV cache pruning to enhance the model's focus on critical visual information.
- Score: 87.35309934860938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While multi-modal large language models (MLLMs) have made significant progress in recent years, the issue of hallucinations remains a major challenge. To mitigate this phenomenon, existing solutions either introduce additional data for further training or incorporate external or internal information during inference. However, these approaches inevitably introduce extra computational costs. In this paper, we observe that hallucinations in MLLMs are strongly associated with insufficient attention allocated to visual tokens. In particular, the presence of redundant visual tokens disperses the model's attention, preventing it from focusing on the most informative ones. As a result, critical visual cues are often under-attended, which in turn exacerbates the occurrence of hallucinations. Building on this observation, we propose \textbf{PruneHal}, a training-free, simple yet effective method that leverages adaptive KV cache pruning to enhance the model's focus on critical visual information, thereby mitigating hallucinations. To the best of our knowledge, we are the first to apply token pruning for hallucination mitigation in MLLMs. Notably, our method don't require additional training and incurs nearly no extra inference cost. Moreover, PruneHal is model-agnostic and can be seamlessly integrated with different decoding strategies, including those specifically designed for hallucination mitigation. We evaluate PruneHal on several widely used hallucination evaluation benchmarks using four mainstream MLLMs, achieving robust and outstanding results that highlight the effectiveness and superiority of our method. Our code will be publicly available.
Related papers
- Exposing Hallucinations To Suppress Them: VLMs Representation Editing With Generative Anchors [8.089908150148554]
Multimodal large language models (MLLMs) have achieved remarkable success across diverse vision-language tasks.<n>MLLMs are highly susceptible to hallucinations, producing content that is fluent but inconsistent with visual evidence.<n>We propose a training-free, self-supervised method for hallucination mitigation.
arXiv Detail & Related papers (2025-09-26T07:24:28Z) - Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization [55.543583937522804]
Multimodal Large Language Models (MLLMs) emerge as a unified interface to address a multitude of tasks.<n>Despite showcasing state-of-the-art results in many benchmarks, a long-standing issue is the tendency of MLLMs to hallucinate.<n>In this paper, we address the problem of hallucinations as an alignment problem, seeking to steer the MLLM so that it prefers generating content without hallucinations.
arXiv Detail & Related papers (2025-08-27T18:02:04Z) - IKOD: Mitigating Visual Attention Degradation in Large Vision-Language Models [20.036659182106806]
We show that Large Vision-Language Models (LVLMs) exhibit a long-term bias where hallucinations increase as the sequence length grows.<n>We propose Image attention-guided Key-value merging cOllaborative Decoding (IKOD), a collaborative decoding strategy generating more image-focused sequences.
arXiv Detail & Related papers (2025-08-05T14:05:15Z) - Mitigating Object Hallucination via Robust Local Perception Search [11.570368427723961]
Local Perception Search (LPS) is a decoding method during inference that is both simple and training-free, yet effectively suppresses hallucinations.<n>We show that LPS significantly reduces the incidence of hallucinations compared to the baseline, showing exceptional performance, particularly in noisy settings.
arXiv Detail & Related papers (2025-06-07T09:27:26Z) - Attention Reallocation: Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs [62.9348974370985]
We propose attention reallocation (AttnReal) to mitigate hallucinations with nearly zero extra cost.<n>Our approach is motivated by the key observations that, MLLM's unreasonable attention distribution causes features to be dominated by historical output tokens.<n>Based on the observations, AttnReal recycles excessive attention from output tokens and reallocates it to visual tokens, which reduces MLLM's reliance on language priors.
arXiv Detail & Related papers (2025-03-11T11:52:37Z) - PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training [56.172959986096316]
This paper aims to address the challenge of hallucinations in Multimodal Large Language Models (MLLMs)<n>HalFscore is a novel metric built upon the language graph and is designed to evaluate both the accuracy and completeness of dense captions at a granular level.<n>PerturboLLaVA significantly improves the fidelity of generated captions, outperforming existing approaches in handling multimodal hallucinations.
arXiv Detail & Related papers (2025-03-09T07:07:03Z) - Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding [66.06337890279839]
Large vision-language models (LVLMs) have shown remarkable capabilities in visual-language understanding for downstream multi-modal tasks.<n>LVLMs still suffer from generating hallucinations in complex generation tasks, leading to inconsistencies between visual inputs and generated content.<n>We propose an Inter-Modality Correlation Decoding (IMCCD) method to mitigate hallucinations in LVLMs in a training-free manner.
arXiv Detail & Related papers (2025-01-03T17:56:28Z) - Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding [25.489832294197797]
This paper introduces the Instruction Contrastive Decoding (ICD) method, a novel approach designed to reduce hallucinations during LVLM inference.
Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules.
arXiv Detail & Related papers (2024-03-27T16:04:47Z) - OPERA: Alleviating Hallucination in Multi-Modal Large Language Models
via Over-Trust Penalty and Retrospection-Allocation [124.9008419182485]
We present OPERA, a novel MLLM decoding method grounded in an Over-trust Penalty and a Retrospection-Allocation strategy.
Our approach begins with an interesting observation that, most hallucinations are closely tied to the knowledge aggregation patterns in the self-attention matrix.
Based on the observation, OPERA introduces a penalty term on the model logits during the beam-search decoding to mitigate the over-trust issue.
arXiv Detail & Related papers (2023-11-29T18:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.