Fugu-MT 論文翻訳(概要): Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations

論文の概要: Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations

arxiv url: http://arxiv.org/abs/2509.11287v1
Date: Sun, 14 Sep 2025 14:26:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 17:26:23.013461
Title: Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations
Title（参考訳）: 自己注入型幻覚による視覚言語モデルにおける幻覚の緩和
Authors: Yifan Lu, Ziqi Zhang, Chunfeng Yuan, Jun Gao, Congxuan Zhang, Xiaojuan Qi, Bing Li, Weiming Hu,
Abstract要約: 幻覚緩和法は主に嗜好アライメントに基づいており、嗜好データ収集には外部の人間のアノテーションや補助モデルが必要である。本稿では,外部依存を伴わない幻覚を緩和する新規で一般化可能な手法である自己注入による自律的選好アライメント(APASI)を提案する。 APASIはターゲットのLVLMを利用して、生成した応答に幻覚を自己注入し、好みのレベルが異なるペアの応答を生成する。
参考スコア（独自算出の注目度）: 73.37711261605271
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Vision-Language Models (LVLMs) suffer from serious hallucination problems, where the model-generated responses are inconsistent with the visual inputs. Existing hallucination mitigation methods are mainly based on preference alignment and require external human annotations or auxiliary models for preference data collection, which increase costs and limit sustainable improvement. To tackle these challenges, we propose Autonomous Preference Alignment via Self-Injection (APASI), a novel and generalizable method that mitigates hallucinations without external dependencies. APASI leverages the target LVLM to self-inject hallucinations into a generated response, creating a pair of responses with varying preference levels. During the self-injection process, the dis-preferred response is generated based on three key observations of hallucinations, ensuring it simulates real hallucination patterns. This fidelity offers an accurate learning signal for hallucination mitigation. Moreover, APASI incorporates an iterative alignment training strategy combined with curriculum learning to periodically update the preference data with increasing challenge, enabling stable and continuous enhancement of the LVLM. Extensive experiments across six benchmarks show that APASI not only effectively mitigates hallucinations for three baseline models but also achieves comparable or even superior performance to alignment-based methods with external dependency, thereby demonstrating its effectiveness and generalization capability. The code is available at https://github.com/davidluciolu/APASI.
Abstract（参考訳）: LVLM(Large Vision-Language Models)は、モデル生成応答が視覚入力と矛盾する深刻な幻覚に悩まされる。既存の幻覚緩和法は、主に好みのアライメントに基づいており、コストの増大と持続的な改善の限界を抑えるために、外部の人間のアノテーションや嗜好データ収集の補助モデルを必要とする。これらの課題に対処するために,外部依存を伴わない幻覚を緩和する,新規で一般化可能な手法であるAPASI(Autonomous Preference Alignment via Self-Injection)を提案する。 APASIはターゲットのLVLMを利用して、生成した応答に幻覚を自己注入し、好みのレベルが異なるペアの応答を生成する。自己注入過程において、非推奨応答は幻覚の3つの重要な観察に基づいて生成され、実際の幻覚パターンをシミュレートする。この忠実度は幻覚軽減のための正確な学習信号を提供する。さらに、APASIは、反復的なアライメントトレーニング戦略とカリキュラム学習を組み合わせることで、優先データを定期的に更新し、課題を増大させ、LVLMの安定的かつ継続的な拡張を可能にする。 6つのベンチマークにわたる大規模な実験により、APASIは3つのベースラインモデルに対する幻覚を効果的に緩和するだけでなく、外部依存性を持つアライメントベースのメソッドと同等またはそれ以上の性能を達成し、その効果と一般化能力を示すことが示されている。コードはhttps://github.com/davidluciolu/APASIで公開されている。

論文の概要: Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations

関連論文リスト