Fugu-MT 論文翻訳(概要): Steering the Verifiability of Multimodal AI Hallucinations

論文の概要: Steering the Verifiability of Multimodal AI Hallucinations

arxiv url: http://arxiv.org/abs/2604.06714v1
Date: Wed, 08 Apr 2026 06:13:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 17:30:51.363174
Title: Steering the Verifiability of Multimodal AI Hallucinations
Title（参考訳）: マルチモーダルAI幻覚の妥当性の定式化
Authors: Jianhong Pang, Ruoxi Cheng, Ziyi Ye, Xingjun Ma, Zuxuan Wu, Xuanjing Huang, Yu-Gang Jiang,
Abstract要約: マルチモーダルな大言語モデル(MLLM)は幻覚を起こす傾向があり、人間のユーザーにかなりのリスクをもたらす。本研究では,覚醒と覚醒のための別々のプローブを学習するアクティベーション空間介入法を提案する。そこで本研究では,異なる介入プローブを誘発し,モデルの妥当性をきめ細かな制御を可能にすることを明らかにする。
参考スコア（独自算出の注目度）: 115.51077572812862
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI applications driven by multimodal large language models (MLLMs) are prone to hallucinations and pose considerable risks to human users. Crucially, such hallucinations are not equally problematic: some hallucination contents could be detected by human users(i.e., obvious hallucinations), while others are often missed or require more verification effort(i.e., elusive hallucinations). This indicates that multimodal AI hallucinations vary significantly in their verifiability. Yet, little research has explored how to control this property for AI applications with diverse security and usability demands. To address this gap, we construct a dataset from 4,470 human responses to AI-generated hallucinations and categorize these hallucinations into obvious and elusive types based on their verifiability by human users. Further, we propose an activation-space intervention method that learns separate probes for obvious and elusive hallucinations. We reveal that obvious and elusive hallucinations elicit different intervention probes, allowing for fine-grained control over the model's verifiability. Empirical results demonstrate the efficacy of this approach and show that targeted interventions yield superior performance in regulating corresponding verifiability. Moreover, simply mixing these interventions enables flexible control over the verifiability required for different scenarios.
Abstract（参考訳）: マルチモーダルな大言語モデル(MLLM)によって駆動されるAIアプリケーションは幻覚を起こしやすく、人間のユーザにかなりのリスクをもたらす。重要な点として、このような幻覚は同様に問題ではない: 幻覚の内容は人によって検出される(すなわち、明らかな幻覚)が、他のものはしばしば見逃されるか、より検証の努力を必要とする(即ち、幻覚)。このことは、マルチモーダルAI幻覚は、その妥当性において著しく異なることを示している。しかし、さまざまなセキュリティとユーザビリティの要求があるAIアプリケーションに対して、このプロパティを制御する方法についてはほとんど研究されていない。このギャップに対処するために、我々は、AI生成した幻覚に対する4,470人の人間の反応から得られたデータセットを構築し、これらの幻覚を、人間のユーザによる検証可能性に基づいて、明白でわかりやすいタイプに分類する。さらに,覚醒と覚醒のための別々のプローブを学習するアクティベーション空間介入法を提案する。そこで本研究では,異なる介入プローブを誘発し,モデルの妥当性をきめ細かな制御を可能にすることを明らかにする。実験により, 本手法の有効性を実証し, 対象の介入が, 対応する妥当性の制御において優れた性能を発揮することを示した。さらに、これらの介入を単純に混ぜることで、異なるシナリオに必要な妥当性を柔軟に制御できる。

論文の概要: Steering the Verifiability of Multimodal AI Hallucinations

関連論文リスト