Fugu-MT 論文翻訳(概要): Revealing and Enhancing Core Visual Regions: Harnessing Internal Attention Dynamics for Hallucination Mitigation in LVLMs

論文の概要: Revealing and Enhancing Core Visual Regions: Harnessing Internal Attention Dynamics for Hallucination Mitigation in LVLMs

arxiv url: http://arxiv.org/abs/2602.15556v1
Date: Tue, 17 Feb 2026 13:08:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-18 16:03:18.070058
Title: Revealing and Enhancing Core Visual Regions: Harnessing Internal Attention Dynamics for Hallucination Mitigation in LVLMs
Title（参考訳）: コア視覚領域の探索と拡張:LVLMにおける幻覚緩和のための内部注意ダイナミクスの調和
Authors: Guangtao Lyu, Qi Liu, Chenghao Xu, Jiexi Yan, Muli Yang, Xueting Li, Fen Fang, Cheng Deng,
Abstract要約: LVLMの内部ポジティブ・アテンション・ダイナミクス(PAD)は、注意シンクの歪みの下で自然に意味的にコアとなる視覚領域を明らかにする。 PADE(Positive Attention Dynamics Enhancement)は、意味的にコアとなる視覚領域を識別するためのPADマップを構築する訓練不要の注意介入である。
参考スコア（独自算出の注目度）: 67.69730908817321
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LVLMs have achieved strong multimodal reasoning capabilities but remain prone to hallucinations, producing outputs inconsistent with visual inputs or user instructions. Existing training-free methods, including contrastive decoding and auxiliary expert models, which incur several times more computational overhead and may introduce potential interference, as well as static internal signal enhancement, are often vulnerable to the attention sink phenomenon. We find that internal Positive Attention Dynamics (PAD) in LVLMs naturally reveal semantically core visual regions under the distortions of attention sinks. Based on this, we propose Positive Attention Dynamics Enhancement (PADE), a training-free attention intervention that constructs a PAD map to identify semantically core visual regions, applies per-head Median Absolute Deviation Scaling to adaptively control the intervention strength, and leverages System-Token Compensation to maintain attention to complex user instructions and support long-term output consistency. Experiments on multiple LVLMs and benchmarks show that PADE improves visual grounding and reduces hallucinations, validating the effectiveness of leveraging internal attention dynamics for reliable multimodal reasoning.
Abstract（参考訳）: LVLMは強力なマルチモーダル推論機能を備えているが、幻覚を起こしやすいままであり、視覚的な入力やユーザ指示と矛盾しない出力を生成する。対照的な復号化や補助的な専門家モデルを含む既存の訓練なしの手法は、計算オーバーヘッドが数倍増加し、静的な内部信号の強化と同様に潜在的な干渉をもたらす可能性があるが、しばしばアテンションシンク現象に対して脆弱である。 LVLMの内部肯定的注意運動(PAD)は、注意シンクの歪みの下で自然に意味的に中核的な視覚領域を明らかにする。そこで我々は,PADE (Positive Attention Dynamics Enhancement) を提案する。PADE (Positive Attention Dynamics Enhancement) は,PADマップを構築して意味的にコアとなる視覚領域を識別し,介入強度を適応的に制御し,複雑なユーザ指示への注意を保ち,長期出力の整合性をサポートする。複数のLVLMおよびベンチマーク実験により、PADEは視覚的接地を改善し、幻覚を低減し、信頼性のあるマルチモーダル推論に内部の注意力学を活用する効果を検証した。

論文の概要: Revealing and Enhancing Core Visual Regions: Harnessing Internal Attention Dynamics for Hallucination Mitigation in LVLMs

関連論文リスト