Fugu-MT 論文翻訳(概要): Bi-Predictability: A Real-Time Signal for Monitoring LLM Interaction Integrity

論文の概要: Bi-Predictability: A Real-Time Signal for Monitoring LLM Interaction Integrity

arxiv url: http://arxiv.org/abs/2604.13061v1
Date: Wed, 18 Mar 2026 18:10:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-19 19:09:11.648061
Title: Bi-Predictability: A Real-Time Signal for Monitoring LLM Interaction Integrity
Title（参考訳）: 予測可能性:LLM干渉積分のリアルタイム信号
Authors: Wael Hafez, Amir Nazeri,
Abstract要約: 両予測可能性(P)を用いて,マルチターンインタラクションの整合性を継続的に監視できることが示される。 Information Digital Twin (IDT) は、コンテキスト、応答、次のプロンプトループを二次推論や埋め込みなしでPを推定する軽量アーキテクチャである。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are increasingly deployed in high-stakes autonomous and interactive workflows, where reliability demands continuous, multi-turn coherence. However, current evaluation methods either rely on post-hoc semantic judges, measure unidirectional token confidence (e.g., perplexity), or require compute-intensive repeated sampling (e.g., semantic entropy). Because these techniques focus exclusively on the model's output distribution, they cannot monitor whether the underlying interaction remains structurally coupled in real time, leaving systems vulnerable to gradual, undetected degradation. Here we show that multi-turn interaction integrity can be continuously monitored using bi-predictability (P), a fundamental information theoretic measure computed directly from raw token frequency statistics. We introduce the Information Digital Twin (IDT), a lightweight architecture that estimates P across the context, response, next prompt loop without secondary inference or embeddings. Across 4,500 conversational turns between a student model and three frontier teacher models, the IDT detected injected disruptions with 100% sensitivity. Crucially, we demonstrate that structural coupling and semantic quality are empirically and practically separable: P aligned with structural consistency in 85% of conditions, but with semantic judge scores in only 44%. This reveals a critical regime of "silent uncoupling" where LLMs produce high-scoring outputs despite degrading conversational context. By decoupling structural monitoring from semantic evaluation, the IDT provides a scalable, computationally efficient mechanism for real-time AI assurance and closed-loop regulation
Abstract（参考訳）: 大規模言語モデル(LLM)は、信頼性が継続的なマルチターンコヒーレンスを必要とする、自律的でインタラクティブなワークフローに、ますます多くデプロイされている。しかし、現在の評価手法は、ポストホックな意味判断に依存するか、一方向のトークン信頼度(例えば、パープレキシティ)を測定するか、あるいは計算集約的な反復サンプリング(例えば、セマンティックエントロピー)を必要とする。これらの技術はモデルの出力分布にのみ焦点を絞っているため、基礎となる相互作用がリアルタイムに構造的に結合されているかどうかを監視できず、システムは徐々に、検出されていない劣化に弱いままである。ここでは、生トークン周波数統計から直接計算される基本情報理論測度であるbi-predictability (P)を用いて、マルチターン相互作用の整合性を継続的に監視できることを示す。 Information Digital Twin (IDT) は、コンテキスト、応答、次のプロンプトループを二次推論や埋め込みなしでPを推定する軽量アーキテクチャである。学生モデルと3つのフロンティア教師モデルの間の4500回にわたる会話のターンにおいて、IDTは100%感度でインジェクトされたディスラプションを検出した。重要なことは,構造的結合と意味的品質は経験的かつ実用的に分離可能であることを示し,Pは85%の条件で構造的整合性に整合するが,意味的判断のスコアは44%に過ぎなかった。このことは、LLMが会話の文脈を劣化させながらもハイスコア出力を生成する「サイレント・アンカップリング」の臨界状態を明らかにしている。構造的モニタリングをセマンティック評価から切り離すことで、IDTはリアルタイムAI保証とクローズドループ制御のためのスケーラブルで効率的なメカニズムを提供する。

論文の概要: Bi-Predictability: A Real-Time Signal for Monitoring LLM Interaction Integrity

関連論文リスト