Fugu-MT 論文翻訳(概要): Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

論文の概要: Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

arxiv url: http://arxiv.org/abs/2510.09781v1
Date: Fri, 10 Oct 2025 18:42:32 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:29.621925
Title: Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
Title（参考訳）: 合成データを用いた汎用エージェントシステムのための基礎ガードレールの構築
Authors: Yue Huang, Hang Hua, Yujun Zhou, Pengcheng Jing, Manish Nagireddy, Inkit Padhi, Greta Dolcetti, Zhangchen Xu, Subhajit Chaudhury, Ambrish Rawat, Liubov Nedoshivina, Pin-Yu Chen, Prasanna Sattigeri, Xiangliang Zhang,
Abstract要約: LLMエージェントは、計画段階で介入するマルチステップタスクを計画できる。既存のガードレールは主にポスト・エグゼクティブ(英語版)を運用しており、スケーリングが困難であり、計画レベルで制御可能な監督を行う余地がほとんどない。我々は、良性軌道を合成し、カテゴリーラベル付きリスクを困難に注入し、自動報酬モデルを介して出力をフィルタリングする制御可能なエンジンであるAuraGenを紹介する。
参考スコア（独自算出の注目度）: 76.18834864749606
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: While LLM agents can plan multi-step tasks, intervening at the planning stage-before any action is executed-is often the safest way to prevent harm, since certain risks can lead to severe consequences once carried out. However, existing guardrails mostly operate post-execution, which is difficult to scale and leaves little room for controllable supervision at the plan level. To address this challenge, we highlight three critical gaps in current research: data gap, model gap, and evaluation gap. To close the data gap, we introduce AuraGen, a controllable engine that (i) synthesizes benign trajectories, (ii) injects category-labeled risks with calibrated difficulty, and (iii) filters outputs via an automated reward model, producing large and reliable corpora for pre-execution safety. To close the guardian model gap, we propose a foundational guardrail Safiron, combining a cross-planner adapter with a compact guardian model. The adapter unifies different input formats, while Safiron flags risky cases, assigns risk types, and generates rationales; trained in two stages with a broadly explored data recipe, Safiron achieves robust transfer across settings. To close the evaluation gap, we release Pre-Exec Bench, a realistic benchmark covering diverse tools and branching trajectories, which measures detection, fine-grained categorization, explanation, and cross-planner generalization in human-verified scenarios. Extensive experiments demonstrate consistent gains of the proposed guardrail over strong baselines on Pre-Exec Bench, and ablations further distill actionable practices, providing a practical template for safer agentic systems.
Abstract（参考訳）: LLMエージェントは、多段階的なタスクを計画できるが、計画段階で介入することは、何らかのアクションが実行される前に行われる。しかし、既存のガードレールは主にポスト・エグゼクティブ(英語版)を運用しており、スケーリングが困難であり、計画レベルで制御可能な監督を行う余地がほとんどない。この課題に対処するために、データギャップ、モデルギャップ、評価ギャップという、現在の研究における3つの重要なギャップを強調します。データギャップを埋めるために、制御可能なエンジンであるAuraGenを紹介します。 (i)良性軌道を合成する。二カテゴリーラベル付リスクを校正困難に注入し、三自動報酬モデルにより出力をフィルタリングし、大規模で信頼性の高いコーパスを前処理の安全のために生成する。保護者モデルのギャップを埋めるために,クロスプランナーアダプタとコンパクトな保護者モデルを組み合わせた基本ガードレールサフィロンを提案する。アダプタは異なる入力形式を統一するが、Safironはリスクのあるケースをフラグ付け、リスクタイプを割り当て、合理性を生成する。評価のギャップを埋めるため、我々はPre-Exec Benchをリリースした。これは多種多様なツールと分岐軌跡をカバーするリアルなベンチマークであり、人間の検証シナリオにおける検出、きめ細かい分類、説明、クロスプランナーの一般化を測定する。大規模な実験では、プレエク・ベンチの強いベースラインよりもガードレールが一貫した利得を示し、さらに実用的なプラクティスを蒸留し、より安全なエージェントシステムのための実用的なテンプレートを提供する。

論文の概要: Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

関連論文リスト