Fugu-MT 論文翻訳(概要): PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning

論文の概要: PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning

arxiv url: http://arxiv.org/abs/2510.19183v1
Date: Wed, 22 Oct 2025 02:41:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:14.937146
Title: PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning
Title（参考訳）: PruneHal:適応的なKVキャッシュ・プルーニングによるマルチモーダル大言語モデルにおける幻覚の低減
Authors: Fengyuan Sun, Hui Chen, Xinhao Xu, Dandan Zheng, Jingdong Chen, Jun Zhou, Jungong Han, Guiguang Ding,
Abstract要約: 大型言語モデル(MLLM)における幻覚は、視覚トークンに割り当てられた注意不足と強く関連している。我々は、適応的なKVキャッシュプルーニングを活用し、重要な視覚情報に焦点をあてるトレーニングフリーでシンプルで効果的な方法である textbfPruneHal を提案する。
参考スコア（独自算出の注目度）: 87.35309934860938
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While multi-modal large language models (MLLMs) have made significant progress in recent years, the issue of hallucinations remains a major challenge. To mitigate this phenomenon, existing solutions either introduce additional data for further training or incorporate external or internal information during inference. However, these approaches inevitably introduce extra computational costs. In this paper, we observe that hallucinations in MLLMs are strongly associated with insufficient attention allocated to visual tokens. In particular, the presence of redundant visual tokens disperses the model's attention, preventing it from focusing on the most informative ones. As a result, critical visual cues are often under-attended, which in turn exacerbates the occurrence of hallucinations. Building on this observation, we propose \textbf{PruneHal}, a training-free, simple yet effective method that leverages adaptive KV cache pruning to enhance the model's focus on critical visual information, thereby mitigating hallucinations. To the best of our knowledge, we are the first to apply token pruning for hallucination mitigation in MLLMs. Notably, our method don't require additional training and incurs nearly no extra inference cost. Moreover, PruneHal is model-agnostic and can be seamlessly integrated with different decoding strategies, including those specifically designed for hallucination mitigation. We evaluate PruneHal on several widely used hallucination evaluation benchmarks using four mainstream MLLMs, achieving robust and outstanding results that highlight the effectiveness and superiority of our method. Our code will be publicly available.
Abstract（参考訳）: 近年,マルチモーダル大規模言語モデル (MLLM) は大きな進歩を遂げているが,幻覚の問題は依然として大きな課題である。この現象を緩和するために、既存のソリューションはさらなるトレーニングのための追加データを導入するか、推論中に外部情報や内部情報を取り入れる。しかし、これらのアプローチは必然的に余分な計算コストをもたらす。本稿では,MLLMにおける幻覚は視覚トークンに割り当てられた注意不足と強く関連していることを示す。特に、冗長な視覚トークンの存在はモデルの注意を分散させ、最も情報性の高いトークンに焦点を合わせるのを防ぐ。その結果、批判的な視覚的手がかりは、しばしば過小評価され、それによって幻覚の発生が悪化する。そこで本研究では, 適応型KVキャッシュプルーニングを利用して, 重要な視覚情報に着目し, 幻覚を緩和する学習自由で, シンプルで効果的な手法である \textbf{PruneHal} を提案する。我々の知る限りでは、MLLMにおける幻覚緩和にトークンプルーニングを最初に適用した人物である。特に、我々の方法は追加のトレーニングを必要とせず、ほとんど追加の推論コストを発生させません。さらにPruneHalはモデルに依存しないため、幻覚の緩和のために特別に設計されたものなど、さまざまなデコード戦略とシームレスに統合することができる。 PruneHalを4つのメインストリームMLLMを用いて,広く使用されている幻覚評価ベンチマークで評価し,本手法の有効性と優位性を示す頑健で優れた結果を得た。私たちのコードは公開されます。

論文の概要: PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning

関連論文リスト