Fugu-MT 論文翻訳(概要): VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing

論文の概要: VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing

arxiv url: http://arxiv.org/abs/2604.19412v1
Date: Tue, 21 Apr 2026 12:40:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 22:41:49.769018
Title: VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing
Title（参考訳）: VCE:視覚的コントラスト編集によるLVLMのゼロコスト幻覚緩和法
Authors: Yanbin Huang, Yisen Li, Guiyao Tie, Xiaoye Qu, Pan Zhou, Hongfei Wang, Zhaofan Zou, Hao Sun, Xuelong Li,
Abstract要約: 大きな視覚言語モデル(LVLM)は、しばしば物体幻覚(OH)に悩まされる近年の研究では、幻覚の問題は言語の先行に起因している可能性が示唆されている。本稿では視覚コントラスト編集(VCE)を提案する。
参考スコア（独自算出の注目度）: 70.82867621856968
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large vision-language models (LVLMs) frequently suffer from Object Hallucination (OH), wherein they generate descriptions containing objects that are not actually present in the input image. This phenomenon is particularly problematic in real-world applications such as medical imaging and autonomous driving, where accuracy is critical. Recent studies suggest that the hallucination problem may stem from language priors: biases learned during pretraining that cause LVLMs to generate words based on their statistical co-occurrence. To mitigate this problem, we propose Visual Contrastive Editing (VCE), a novel post-hoc method that identifies and suppresses hallucinatory tendencies by analyzing the model's response to contrastive visual perturbations. Using Singular Value Decomposition (SVD), we decompose the model's activation patterns to isolate hallucination subspaces and apply targeted parameter edits to attenuate its influence. Unlike existing approaches that require fine-tuning or labeled data, VCE operates as a label-free intervention, making it both scalable and practical for deployment in resource-constrained settings. Experimental results demonstrate that VCE effectively reduces object hallucination across multiple benchmarks while maintaining the model's original computational efficiency.
Abstract（参考訳）: 大規模視覚言語モデル(LVLM)は、しばしばオブジェクト幻覚(OH)に悩まされ、実際に入力画像に存在しないオブジェクトを含む記述を生成する。この現象は、医療画像や自律運転など、精度が重要な現実世界の応用において特に問題となる。近年の研究では、幻覚問題は、前訓練中に学習した偏見によってLVLMがそれらの統計的共起に基づいて単語を生成することが示唆されている。この問題を軽減するために,視覚コントラスト編集法(VCE)を提案する。この手法は,対照的な視覚摂動に対するモデルの応答を解析することにより,幻覚の傾向を識別・抑制する。 Singular Value Decomposition (SVD) を用いて、モデルのアクティベーションパターンを分解して幻覚部分空間を分離し、ターゲットパラメータの編集を適用してその影響を緩和する。微調整やラベル付きデータを必要とする既存のアプローチとは異なり、VCEはラベルなしの介入として運用されており、リソース制約のある設定に展開するのにスケーラブルで実用的なものである。実験により、VCEはモデルの本来の計算効率を維持しながら、複数のベンチマークでオブジェクトの幻覚を効果的に低減することを示した。

論文の概要: VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing

関連論文リスト