Fugu-MT 論文翻訳(概要): PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

論文の概要: PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

arxiv url: http://arxiv.org/abs/2605.06455v1
Date: Thu, 07 May 2026 15:49:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.95762
Title: PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors
Title（参考訳）: PrefixGuard: LLM-Agentトレースからオンライン障害監視モニターへ
Authors: Xinmiao Huang, Jinwei Hu, Rajarshi Roy, Changshun Wu, Yi Dong, Xiaowei Huang,
Abstract要約: 大規模言語モデル(LLM)エージェントは、最終結果チェックが介入するには遅すぎるような、長時間のツール使用タスクを実行する。 PrefixGuardは、オフラインのStepView誘導ステップと監視監視トレーニングを備えたトレース・ツー・モニタフレームワークである。 WebArena, $2$-Bench, SkillsBench, TerminalBench, 最も強力なPrefixGuardモニタは0.900/0.70.533/0.557 AUPRCである。
参考スコア（独自算出の注目度）: 14.336100401626062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM) agents now execute long, tool-using tasks where final outcome checks can arrive too late for intervention. Online warning requires lightweight prefix monitors over heterogeneous traces, but hand-authored event schemas are brittle and deployment-time LLM judging is costly. We introduce PrefixGuard, a trace-to-monitor framework with an offline StepView induction step followed by supervised monitor training. StepView induces deterministic typed-step adapters from raw trace samples, and the monitor learns an event abstraction and prefix-risk scorer from terminal outcomes. Across WebArena, $τ^2$-Bench, SkillsBench, and TerminalBench, the strongest PrefixGuard monitors reach 0.900/0.710/0.533/0.557 AUPRC. Using the strongest backend within each representation, they improve over raw-text controls by an average of +0.137 AUPRC. LLM judges remain substantially weaker under the same prefix-warning protocol. We also derive an observability ceiling on score-based area under the precision-recall curve (AUPRC) that separates monitor error from failures lacking evidence in the observed prefix. For finite-state audit, post-hoc deterministic finite automaton (DFA) extraction remains compact on WebArena and $τ^2$-Bench (29 and 20 states) but expands to 151 and 187 states on SkillsBench and TerminalBench. Finally, first-alert diagnostics show that strong ranking does not imply deployment utility: WebArena ranks well yet fails to support low-false-alarm alerts, whereas $τ^2$-Bench and TerminalBench retain more actionable early alerts. Together, these results position PrefixGuard as a practical monitor-synthesis recipe with explicit diagnostics for when prefix warnings translate into actionable interventions.
Abstract（参考訳）: 大規模言語モデル(LLM)エージェントが、最終結果チェックが手遅れで介入できないような、長時間のツール使用タスクを実行できるようになった。オンライン警告では、異種トレース上で軽量なプレフィックスモニタを必要とするが、手書きのイベントスキーマは不安定であり、デプロイメント時のLCM判定にはコストがかかる。 PrefixGuardは、オフラインのStepView誘導ステップと監視監視トレーニングを備えたトレース・ツー・モニタフレームワークである。 StepViewは生のトレースサンプルから決定論的型付けステップアダプタを誘導し、モニターは終端結果からイベント抽象化とプレフィックスリスクスコアラを学習する。 Across WebArena, $τ^2$-Bench, SkillsBench, TerminalBench, 最も強力なPrefixGuardモニタは0.900/0.710/0.533/0.557 AUPRCである。各表現の中で最強のバックエンドを使用することで、生テキストコントロールよりも平均+0.137 AUPRCの改善を実現している。 LLM判事は、同じプレフィックス警告プロトコルの下では、かなり弱いままである。また,AUPRC (precision-recall curve) によるスコアベース領域の可観測性天井を導出した。有限状態監査では、ポストホック決定性有限オートマトン(DFA)抽出はWebArenaと$τ^2$-Bench(29州と20州)ではコンパクトだが、SkillsBenchと TerminalBenchでは151州と187州に拡張される。最後に、ファーストアラート診断は、強力なランキングがデプロイメントユーティリティを示唆しないことを示している: WebArenaは、十分にランク付けされているが、ローファースアラームアラートをサポートしない。これらの結果とともに、PrefixGuardは、プレフィックス警告が実行可能な介入に変換されたときの明確な診断を伴う、実用的な監視合成レシピとして位置づけられている。

論文の概要: PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

関連論文リスト