Fugu-MT 論文翻訳(概要): SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows

論文の概要: SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows

arxiv url: http://arxiv.org/abs/2606.08049v1
Date: Sat, 06 Jun 2026 08:27:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:05.69456
Title: SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows
Title（参考訳）: SKILL.nb: 耐久性のあるエージェントワークフローのための選択形式化とGated Execution
Authors: Amine El Hattami, Nicolas Chapados, Christopher Pal,
Abstract要約: SKILL.nbは,エビデンス・リバース・キャリブレーションによるライフサイクルポリシーによる再利用可能なエージェント管理のためのフレームワークである。 SKILL.nbは選択的な形式化を使用する: 実行はどのワークフローステップを実行可能なコードにするかを決定する。ゲート条件付き実行では、各ステップがゲートの検証時にコードを実行したり、ドリフトが実行可能実現を無効にした場合にローカルにフォールバックすることが可能になる。
参考スコア（独自算出の注目度）: 16.693609667845948
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI agents increasingly turn past experience into reusable artifacts such as code, workflows, and procedural memories. Reuse can improve efficiency, but it also creates a lifecycle reliability problem: artifacts that succeed once may fail under environment drift, underspecified tasks, or changing task distributions, especially in web automation. We introduce SKILL.nb, a framework for governing reusable agent workflows with evidence-calibrated lifecycle policies. SKILL.nb uses selective formalization: execution evidence decides which workflow steps should become executable code, which should remain natural-language guided, and when those choices should be revised. Workflows are stored as auditable, versioned notebooks that interleave natural-language guidance, multi-language executable cells, validation gates, fallback paths, and multimodal evidence such as outputs, screenshots, and error traces. At runtime, gate-conditioned execution lets each step run code when its gates validate, or fall back locally when drift invalidates the executable realization. On WebArena-Verified, SKILL.nb achieves 53.7% single-round success, improving over the strongest baseline by 3.9 percentage points. Across three re-executions, it retains 91.7% of initially successful tasks, 15.5 points above the next best method. Under bounded repair, it recovers 72.9% of subsequent failures while limiting post-repair regressions to 4.2%, compared with 15.0% to 17.0% for persistent baselines. It also leads on Mind2Web cross-website and cross-domain splits. In a GitLab migration test, SKILL.nb preserves performance when reusing frozen state learned on GitLab 15.7, with frozen-versus-fresh target-version gaps of -1.7 points on GitLab 16.11 and +0.6 points on GitLab 18.9. These results identify lifecycle governance and gate-conditioned execution as reliability axes beyond one-shot task success.
Abstract（参考訳）: AIエージェントは、過去の経験をコード、ワークフロー、手続き記憶などの再利用可能なアーティファクトに変える。再使用は効率を改善することができるが、ライフサイクルの信頼性の問題も生み出す。環境ドリフトや不特定タスク、特にWeb自動化において、一度成功すれば失敗するアーティファクトである。本稿では,再利用可能なエージェントワークフローをエビデンス校正ライフサイクルポリシーで管理するフレームワークであるSKILL.nbを紹介する。 SKILL.nbは選択的な形式化を使用する: 実行証拠は、どのワークフローステップが実行可能コードになるかを決定し、自然言語でガイドされ、その選択がいつ修正されるかを決定する。ワークフローは監査可能なバージョン付きノートブックとして格納され、自然言語ガイド、多言語実行可能セル、バリデーションゲート、フォールバックパス、出力、スクリーンショット、エラートレースなどのマルチモーダルエビデンスをインターリーブする。実行時にゲート条件付き実行では、各ステップがゲートの検証時にコードを実行するか、ドリフトが実行可能実現を無効にした場合にローカルにフォールバックする。 WebArena-VerifiedではSKILL.nbが53.7%成功し、最強のベースラインを3.9%改善している。 3つの再実行の中で、最初に成功したタスクの91.7%を保持し、次のベストメソッドよりも15.5ポイント高い。修復後のリフレクションを4.2%に制限しながら72.9%の障害を回復し、持続的ベースラインでは15.0%から17.0%に制限した。また、Mind2Webのクロスサイトとクロスドメインの分割を導く。 GitLab移行テストでは、SKILL.nbは、GitLab 15.7で学んだ凍結状態の再利用時のパフォーマンスを保ち、GitLab 16.11で-1.7ポイント、GitLab 18.9で+0.6ポイントのフリーズとリバース/フレッシュのターゲット変換ギャップを持つ。これらの結果は、ライフサイクルガバナンスとゲート条件付き実行を、ワンショットタスクの成功以上の信頼性の軸として特定します。

論文の概要: SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows

関連論文リスト