Fugu-MT 論文翻訳(概要): Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation

論文の概要: Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation

arxiv url: http://arxiv.org/abs/2603.16664v1
Date: Tue, 17 Mar 2026 15:30:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.376083
Title: Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation
Title（参考訳）: ケストレル:LVLMの幻覚緩和のための自給自足
Authors: Jiawei Mao, Hardy Chen, Haoqin Tu, Yuhan Wang, Letian Zhang, Zeyu Zheng, Huaxiu Yao, Zirui Wang, Cihang Xie, Yuyin Zhou,
Abstract要約: 大規模視覚言語モデル(LVLM)はますます強まりつつあるが、マルチモーダルタスクにおいて幻覚を起こす傾向にある。幻覚を避けるためにこれらのLVLMを訓練することは、より大きなモデルでは違法に高価になるため、トレーニングフリーな手法はこの問題に対して安価で柔軟な解決策を提供する。我々は,視覚的視覚的接地剤とエビデンスを検証した自己修復機構を組み合わせた,LVLM幻覚軽減のためのトレーニングフリーフレームワークであるKestrelを提案する。
参考スコア（独自算出の注目度）: 86.37623966653688
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large vision-language models (LVLMs) have become increasingly strong but remain prone to hallucinations in multimodal tasks, which significantly narrows their deployment. As training these LVLMs to avoid hallucinations becomes prohibitively expensive for larger models, training-free methods offer a cheap and flexible solution to this problem, yet existing approaches based on decoding or tool use often bring limited gains and/or weak interpretability. We propose Kestrel, a training-free framework for LVLM hallucination mitigation that combines an explicit visual-grounding agent with evidence-verified self-refinement mechanism. In detail, Kestrel first collects explicit visual evidence and converts tool outputs into reusable and structured textual evidence. Second, to take full advantage of these evidence, Kestrel verifies them via an LVLM judge for evidence checking, then iteratively self-refine answers based on verified evidence to reduce the risk of over-correction. Extensive experiments show that Kestrel improves performance over strong baselines across hallucination benchmarks (e.g., average +3.31% on POPE and +28.34 on MME-Hallucination with Qwen3-VL), while providing transparent verification traces for hallucination diagnosis and analysis -- e.g., both the integrated self-refinement module and grounding agent contributing an average +2.0% gain on POPE.
Abstract（参考訳）: 大規模視覚言語モデル(LVLM)はますます強まりつつあるが、マルチモーダルタスクにおける幻覚の傾向が強くなり、展開を著しく制限している。幻覚を避けるためにこれらのLVLMを訓練することは、より大きなモデルでは違法に高価になるので、トレーニングなしの手法は、この問題に対して安価で柔軟な解決策を提供するが、デコードやツールの使用に基づく既存のアプローチは、しばしば限られた利得や弱い解釈可能性をもたらす。我々は,視覚的視覚的接地剤とエビデンスを検証した自己修復機構を組み合わせた,LVLM幻覚軽減のためのトレーニングフリーフレームワークであるKestrelを提案する。詳しくは、ケストレルはまず明確な視覚的証拠を収集し、ツール出力を再利用可能な構造化されたテキスト的証拠に変換する。第二に、これらの証拠を最大限に活用するために、ケストレルはLVLMの審査員を通じて証拠の確認を検証し、検証された証拠に基づいて反復的に答えを精査し、過度な補正のリスクを減らす。大規模な実験の結果、ケストレルは幻覚のベンチマークで強いベースライン(POPEでは平均+3.31%、Qwen3-VLではMME-Hallucinationでは+28.34)で性能を改善し、幻覚の診断と分析のための透明な検証トレースを提供する。

論文の概要: Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation

関連論文リスト