Fugu-MT 論文翻訳(概要): Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation

論文の概要: Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation

arxiv url: http://arxiv.org/abs/2604.20366v1
Date: Wed, 22 Apr 2026 09:02:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:11.058847
Title: Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation
Title（参考訳）: 性能劣化のない大規模視覚言語モデルにおける幻覚の緩和
Authors: Xingyu Zhu, Junfeng Fang, Shuo Wang, Beier Zhu, Zhicai Wang, Yonghui Yang, Xiangnan He,
Abstract要約: LVLM(Large Vision-Language Models)は強力な生成能力を示すが、出力信頼性を損なう幻覚をしばしば生み出す。本稿では,性能劣化を伴わない幻覚を緩和するための2段階フレームワークMPDを提案する。 MPDは最先端のパフォーマンスを達成し、幻覚を23.4%減らし、一般的な生成能の97.4%を維持している。
参考スコア（独自算出の注目度）: 39.98529932278864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Vision-Language Models (LVLMs) exhibit powerful generative capabilities but frequently produce hallucinations that compromise output reliability. Fine-tuning on annotated data devoid of hallucinations offers the most direct solution, while its high computational cost motivates recent representation-based methods, which focus on mitigating hallucinatory components within hidden representations. Though efficient, we empirically observe that these methods degrade general generation capacity due to incomplete extraction of hallucination components and non-selective parameter updates. To address these limitations, we propose MPD, a dual-stage framework for mitigating hallucinations without performance degradation. Specifically, our MPD relies on two essential factors: (1) semantic-aware component disentanglement to extract pure hallucination components, and (2) interpretable parameter updates that selectively modify parameters most relevant to hallucination. Extensive experiments demonstrate that MPD achieves state-of-the-art performance, reducing hallucinations by 23.4\% while maintaining 97.4\% of general generative capability as evaluated on LLaVA-Bench and MME, with no additional computational cost.
Abstract（参考訳）: LVLM(Large Vision-Language Models)は強力な生成能力を示すが、出力信頼性を損なう幻覚をしばしば生み出す。幻覚を欠いた注釈付きデータの微調整は最も直接的な解決策であるが、その高い計算コストは隠蔽表現内の幻覚成分を緩和することに焦点を当てた最近の表現に基づく手法を動機付けている。しかし,これらの手法は幻覚成分の不完全抽出と非選択的パラメータ更新による一般生成能力の低下を実証的に観察した。これらの制約に対処するため,性能劣化を伴わずに幻覚を緩和するための2段階フレームワークMPDを提案する。具体的には,(1)純幻覚成分を抽出するための意味認識成分のゆがみ,(2)幻覚成分に最も関係のあるパラメータを選択的に修正する解釈可能なパラメータの更新という2つの重要な要素に依存している。大規模な実験により、MPDは最先端のパフォーマンスを達成し、LLaVA-BenchとMMEで評価された一般的な生成能の97.4\%を維持しながら、幻覚を23.4\%削減することを示した。

論文の概要: Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation

関連論文リスト