Fugu-MT 論文翻訳(概要): Rethinking Visual Neglect: Steering via Context-Preference for MLLM Hallucination Mitigation

論文の概要: Rethinking Visual Neglect: Steering via Context-Preference for MLLM Hallucination Mitigation

arxiv url: http://arxiv.org/abs/2605.27993v1
Date: Wed, 27 May 2026 05:33:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:55.774481
Title: Rethinking Visual Neglect: Steering via Context-Preference for MLLM Hallucination Mitigation
Title（参考訳）: 視覚無視を再考する: MLLM幻覚軽減のためのコンテキスト参照によるステアリング
Authors: Jingwen Wu, Xijun Zhang, Ge Song,
Abstract要約: 画像は文脈として、モデルのパラメトリック知識とテキストコンテキストと同時に競合する、と我々は主張する。本研究では,2つの意味論的に異なるコンテキスト参照ベクトルを抽出する,学習不要なフレームワークであるコンテキスト参照ステアリング(CAS)を提案する。実験により、CASは遅延遅延を増大させることなくオブジェクト幻覚を実質的に緩和し、ネイティブテキスト生成の品質を保っていることが示された。
参考スコア（独自算出の注目度）: 5.041079621345155
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Object hallucination remains a primary obstacle to the reliable deployment of Multimodal Large Language Models (MLLMs). Current inference-time mitigation methods mainly assume hallucinations stem from visual neglect, steering models to enhance visual reliance. In contrast, our systematic interventions on multiple MLLMs show that pushing toward more visual reliance may exacerbate hallucinations on some models, while less may mitigate hallucinations. This result suggests that attributing hallucinations solely to visual insufficiency is underdetermined. We argue that the image, as a context, simultaneously competes with the model's parametric knowledge and the textual context. For this, we propose a training-free framework, Context-Preference Activation Steering (CAS). It extracts two semantically distinct Context Preference Vectors (CPVs) via two small sets of designed conflict samples and applies them via single-pass signed residual injection at mid-early MLP layers during inference to control information reliance. Experiments show that CAS substantially mitigates object hallucinations without increasing decoding latency and preserves native text-generation quality.
Abstract（参考訳）: オブジェクト幻覚は、Multimodal Large Language Models (MLLM)の信頼性の高いデプロイにおいて、依然として主要な障害である。現在の推測時間緩和法は、主に視覚的依存を高めるために視覚的無視、ステアリングモデルから幻覚を仮定する。対照的に、複数のMLLMに対する系統的な介入は、より視覚的依存への推進が、一部のモデルにおいて幻覚を悪化させる可能性がある一方で、幻覚を緩和する可能性が低いことを示している。この結果から,視覚障害のみによる幻覚の帰属が過小評価されていることが示唆された。画像は文脈として、モデルのパラメトリック知識とテキストコンテキストと同時に競合する、と我々は主張する。そこで本研究では,CAS(Context-Preference Activation Steering)という,トレーニング不要のフレームワークを提案する。意味的に異なる2つのコンテキスト優先ベクトル(CPV)を2つの小さな設計された競合サンプルを通して抽出し、情報依存の推論中にMLP層にシングルパス署名された残差注入により適用する。実験により、CASは復号遅延を増大させることなくオブジェクト幻覚を実質的に緩和し、ネイティブテキスト生成の品質を保っていることが示された。

論文の概要: Rethinking Visual Neglect: Steering via Context-Preference for MLLM Hallucination Mitigation

関連論文リスト