Fugu-MT 論文翻訳(概要): Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in multimodal LLMs

論文の概要: Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in multimodal LLMs

arxiv url: http://arxiv.org/abs/2508.19366v1
Date: Tue, 26 Aug 2025 18:54:52 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-28 19:07:41.401267
Title: Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in multimodal LLMs
Title（参考訳）: アングラウンド化:マルチモーダルLLMにおける幻覚の定量化のためのスペクトルグラフフレームワーク
Authors: Supratik Sarkar, Swagatam Das,
Abstract要約: 大規模な言語モデルにおける幻覚を定量化するための拡散力学における厳密な情報幾何学的枠組みを初めて紹介する。我々のフレームワークは、時間をかけて幻覚の進化を捉える、モダリティを意識した理論的に解釈可能なメトリクスを提供する。この研究は幻覚の定量化と有界化の原理的な基礎を確立し、それらを質的なリスクから、抽出可能な分析可能な現象に変換する。
参考スコア（独自算出の注目度）: 19.099044165107696
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hallucinations in large language models (LLMs) remain a fundamental obstacle to trustworthy AI, particularly in high-stakes multimodal domains such as medicine, law, and finance. Existing evaluation techniques are largely heuristic -- anchored in qualitative benchmarking or ad-hoc empirical mitigation -- providing neither principled quantification nor actionable theoretical guarantees. This gap leaves a critical blind spot in understanding how hallucinations arise, propagate, and interact across modalities. We introduce the first (to our knowledge) rigorous information geometric framework in diffusion dynamics for quantifying hallucinations in multimodal LLMs (MLLMs), advancing the field from qualitative detection to mathematically grounded measurement. Our approach represents MLLM outputs as the spectral embeddings over multimodal graph Laplacians and characterizes the manifold gaps of truth vs inconsistencies as the semantic distortion, enabling the tight Rayleigh--Ritz bounds on the multimodal hallucination energy as a functional of time-dependent temperature profiles. By leveraging eigenmode decompositions in Reproducing Kernel Hilbert Space (RKHS) embeddings, our framework delivers modality-aware, theoretically interpretable metrics that capture the evolution of hallucinations across time and input prompts through temperature annealing. This work establishes a principled foundation for quantifying and bounding hallucinations, transforming them from a qualitative risk to a tractable, analyzable phenomenon.
Abstract（参考訳）: 大規模言語モデル(LLM)における幻覚は、信頼に値するAI、特に医学、法学、金融といった高度なマルチモーダル領域の基本的な障害である。既存の評価手法は、主にヒューリスティックであり、定性的なベンチマークや準ホックな経験的緩和に固定されている -- 原理的定量化や作用可能な理論的保証を提供していない。このギャップは、幻覚がどのように発生し、伝播し、モダリティを越えて相互作用するかを理解するための重要な盲点を残している。我々は、多モードLLM(MLLM)の幻覚を定量化するための拡散力学における第1の(私たちの知識のために)厳密な情報幾何学的枠組みを導入し、定性的な検出から数学的に基底された測定まで分野を前進させ、数式グラフ上のスペクトル埋め込みとしてMLLMの出力を表現し、セマンティックな表現として真理と矛盾の多様体を特徴付ける。再生ケルネルヒルベルト空間(RKHS)埋め込みにおける固有モード分解を利用して、時間と入力による幻覚の進化を捉え、温度アニーリングを通じて、理論的に解釈可能な測度を提供する。この研究は幻覚の定量化と有界化の原理的な基礎を確立し、それらを質的なリスクから、抽出可能な分析可能な現象に変換する。

論文の概要: Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in multimodal LLMs

関連論文リスト