Fugu-MT 論文翻訳(概要): Entropy Distribution as a Fingerprint for Hallucinations in Generative Models

論文の概要: Entropy Distribution as a Fingerprint for Hallucinations in Generative Models

arxiv url: http://arxiv.org/abs/2605.28264v1
Date: Wed, 27 May 2026 10:12:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:55.960395
Title: Entropy Distribution as a Fingerprint for Hallucinations in Generative Models
Title（参考訳）: 生成モデルにおける幻覚のフィンガープリントとしてのエントロピー分布
Authors: Mattia J. Villani, Pranav Deshpande, Akshay Seshadri, Romina Yalovetzky, Niraj Kumar,
Abstract要約: Calibrated Entropy Score (CES) は幻覚検出のための軽量なアルゴリズムである。 CESは、平均信号と生成されたエントロピーの最大信号とを校正基準CDFを介して結合する。 CESは、はるかに大きな計算コストを必要とするマルチサンプル手法と統計的に区別できない。
参考スコア（独自算出の注目度）: 1.5135066879411019
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) often generate factually incorrect outputs, commonly termed hallucinations, that undermine trust and limit deployment in high-stakes settings. Existing hallucination detection methods typically require multiple forward passes, or access to model internals. In this work, we provide theoretical background and empirical evidence that the distribution of token-level entropies, beyond the mean captured by perplexity or length-normalised entropy, serves as a fingerprint of hallucination, with distributional shape and tail behaviour carrying independent signal. We formalize hallucination detection as a statistical hypothesis test and propose the Calibrated Entropy Score (CES), a lightweight algorithm requiring only a single forward pass and black-box access to token logits. CES combines the mean signal with the maximum signal of the generated entropy through a calibrated reference CDF, producing scores that are directly comparable across models and tasks. We establish finite-sample calibration guarantees via a novel random-length Dvoretzky--Kiefer--Wolfowitz inequality, and also prove that CES detects hallucinations with probability converging to one exponentially fast in the generation length. Across eight QA benchmarks and ten generator models spanning open-source and API access models, CES achieves the highest detection performance among all single-pass black-box methods while providing formal error guarantees that existing heuristics lack. Remarkably, CES is statistically indistinguishable from multi-sample methods that require far greater computational cost, closing the gap between lightweight and expensive detection and making it suitable for real-time, large-scale deployment.
Abstract（参考訳）: 大規模言語モデル(LLM)は、しばしば、信頼を損なうことや高レベルな設定でのデプロイメントを制限する、事実的に誤った出力を生成する。既存の幻覚検出方法は、通常、複数の前方パス、またはモデル内部へのアクセスを必要とする。本研究では, トークンレベルのエントロピーの分布が, パープレキシティや長さ正規化エントロピーによって捉えられた平均を超えて, 幻覚の指紋として機能することを示す。統計的仮説テストとして幻覚検出を形式化し,単一前方通過とトークンロジットへのブラックボックスアクセスのみを必要とする軽量アルゴリズムCalibrated Entropy Score (CES)を提案する。 CESは、平均信号と生成されたエントロピーの最大信号とを、校正された基準CDFを通して組み合わせ、モデルとタスク間で直接比較できるスコアを生成する。我々は、新しいランダム長Dvoretzky--Kiefer-Wolfowitz不等式による有限サンプル校正保証を確立し、また、CESが生成長さにおいて指数的に高速に収束する確率で幻覚を検出することを証明した。オープンソースとAPIアクセスモデルにまたがる8つのQAベンチマークと10のジェネレータモデルに対して、CESは、既存のヒューリスティックスが欠如していることの正式なエラー保証を提供しながら、すべてのシングルパスブラックボックスメソッドの中で最高の検出パフォーマンスを達成する。注目すべきは、CESが計算コストを大幅に上回るマルチサンプル手法と統計的に区別できないことだ。

論文の概要: Entropy Distribution as a Fingerprint for Hallucinations in Generative Models

関連論文リスト