Fugu-MT 論文翻訳(概要): Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems

論文の概要: Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems

arxiv url: http://arxiv.org/abs/2509.10401v1
Date: Fri, 12 Sep 2025 16:51:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-15 16:03:08.168093
Title: Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems
Title（参考訳）: 減算, 法, 予測:多エージェントシステムにおける自動故障帰属の因果推論
Authors: Alva West, Yixuan Weng, Minjun Zhu, Zhen Lin, Yue Zhang,
Abstract要約: マルチエージェントシステムにおける障害帰属は、批判的だが未解決の課題である。現在の手法では、これを長い会話ログ上のパターン認識タスクとして扱う。 A2P Scaffoldingは、パターン認識から構造化因果推論タスクへの障害帰属を変換する。
参考スコア（独自算出の注目度）: 19.51773458179898
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Failure attribution in multi-agent systems -- pinpointing the exact step where a decisive error occurs -- is a critical yet unsolved challenge. Current methods treat this as a pattern recognition task over long conversation logs, leading to critically low step-level accuracy (below 17\%), which renders them impractical for debugging complex systems. Their core weakness is a fundamental inability to perform robust counterfactual reasoning: to determine if correcting a single action would have actually averted the task failure. To bridge this counterfactual inference gap, we introduce Abduct-Act-Predict (A2P) Scaffolding, a novel agent framework that transforms failure attribution from pattern recognition into a structured causal inference task. A2P explicitly guides a large language model through a formal three-step reasoning process within a single inference pass: (1) Abduction, to infer the hidden root causes behind an agent's actions; (2) Action, to define a minimal corrective intervention; and (3) Prediction, to simulate the subsequent trajectory and verify if the intervention resolves the failure. This structured approach leverages the holistic context of the entire conversation while imposing a rigorous causal logic on the model's analysis. Our extensive experiments on the Who\&When benchmark demonstrate its efficacy. On the Algorithm-Generated dataset, A2P achieves 47.46\% step-level accuracy, a 2.85$\times$ improvement over the 16.67\% of the baseline. On the more complex Hand-Crafted dataset, it achieves 29.31\% step accuracy, a 2.43$\times$ improvement over the baseline's 12.07\%. By reframing the problem through a causal lens, A2P Scaffolding provides a robust, verifiable, and significantly more accurate solution for automated failure attribution.
Abstract（参考訳）: マルチエージェントシステムにおける失敗の帰属 -- 決定的なエラーが発生する正確なステップを指示する -- は、批判的だが未解決の課題である。現在の方法では、これを長い会話ログ上のパターン認識タスクとして扱うため、ステップレベルの精度が著しく低くなり(17.5%以下)、複雑なシステムをデバッグするには実用的でない。その中核的な弱点は、単一のアクションの修正が実際にタスクの失敗を回避したかどうかを判断する、堅牢な反ファクト推論を実行するための根本的な障害である。本稿では,障害原因をパターン認識から構造的因果推論タスクに変換する新しいエージェントフレームワークであるAbduct-Act-Predict (A2P) Scaffoldingを紹介する。 A2Pは、単一の推論パス内の3段階の正式な推論プロセスを通じて、大きな言語モデルを明示的にガイドする: 1) エージェントのアクションの背後にある隠された根本原因を推論する; (2) 最小の修正的介入を定義する; (3) 予測: 後続の軌道をシミュレートし、介入が失敗を解消するかどうかを検証する。この構造的アプローチは、モデル分析に厳密な因果論理を課しながら、会話全体の全体的コンテキストを活用する。 Who\&Whenベンチマークに関する広範な実験により、その効果が実証された。 Algorithm-Generatedデータセットでは、A2Pはステップレベルの精度47.46\%、ベースラインの16.67\%よりも2.85$\times$改善されている。より複雑なハンドクラフトデータセットでは、29.31\%のステップ精度を実現し、ベースラインの12.07\%よりも2.43$\times$改善されている。 A2P Scaffoldingは、因果レンズで問題をリフレッシュすることで、自動化された失敗帰属に対する堅牢で検証可能な、そしてはるかに正確な解決策を提供する。

論文の概要: Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems

関連論文リスト