Fugu-MT 論文翻訳(概要): Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

論文の概要: Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

arxiv url: http://arxiv.org/abs/2604.15579v1
Date: Thu, 16 Apr 2026 23:18:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-20 22:00:19.676052
Title: Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility
Title（参考訳）: ドメイン特化剤の象徴的ガードレール--安全と安全の確保
Authors: Yining Hong, Yining She, Eunsuk Kang, Christopher S. Timperley, Christian Kästner,
Abstract要約: 高度なビジネス環境では、AIエージェントによる意図しないアクションは受け入れがたい害を引き起こす可能性がある。トレーニングベースの方法や神経ガードレールといった既存の緩和策は、エージェントの信頼性を向上させるが、保証は提供できない。我々は、AIエージェントの安全性とセキュリティを保証するための実践的な道として、象徴的なガードレールについて研究する。
参考スコア（独自算出の注目度）: 17.915323061295467
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI agents that interact with their environments through tools enable powerful applications, but in high-stakes business settings, unintended actions can cause unacceptable harm, such as privacy breaches and financial loss. Existing mitigations, such as training-based methods and neural guardrails, improve agent reliability but cannot provide guarantees. We study symbolic guardrails as a practical path toward strong safety and security guarantees for AI agents. Our three-part study includes a systematic review of 80 state-of-the-art agent safety and security benchmarks to identify the policies they evaluate, an analysis of which policy requirements can be guaranteed by symbolic guardrails, and an evaluation of how symbolic guardrails affect safety, security, and agent success on $τ^2$-Bench, CAR-bench, and MedAgentBench. We find that 85\% of benchmarks lack concrete policies, relying instead on underspecified high-level goals or common sense. Among the specified policies, 74\% of policy requirements can be enforced by symbolic guardrails, often using simple, low-cost mechanisms. These guardrails improve safety and security without sacrificing agent utility. Overall, our results suggest that symbolic guardrails are a practical and effective way to guarantee some safety and security requirements, especially for domain-specific AI agents. We release all codes and artifacts at https://github.com/hyn0027/agent-symbolic-guardrails.
Abstract（参考訳）: ツールを介して環境と対話するAIエージェントは、強力なアプリケーションを可能にするが、高度なビジネス環境では、意図しないアクションは、プライバシー侵害や財務損失など、許容できない害を引き起こす可能性がある。トレーニングベースの方法や神経ガードレールといった既存の緩和策は、エージェントの信頼性を向上させるが、保証は提供できない。我々は、AIエージェントの安全性とセキュリティを保証するための実践的な道として、象徴的なガードレールについて研究する。当社の3部構成調査では,80件の最先端エージェントの安全性とセキュリティベンチマークを体系的にレビューし,評価したポリシーの特定,シンボルガードレールによるポリシー要件の保証に関する分析,および$τ^2$-Bench, CAR-bench, MedAgentBench上でのシンボルガードレールの安全性, セキュリティ, エージェント成功に対する影響の評価を行った。 85 %のベンチマークでは具体的なポリシーが欠如しており、その代わりに未特定のハイレベルな目標や常識に依存している。規定されたポリシーの中で、74\%のポリシー要件は、しばしばシンプルで低コストなメカニズムを使用して、象徴的なガードレールによって実施することができる。これらのガードレールは、エージェントユーティリティを犠牲にすることなく、安全とセキュリティを改善します。全体としては、象徴的なガードレールは、特にドメイン固有のAIエージェントに対して、いくつかの安全性とセキュリティ要件を保証するための実用的で効果的な方法であることを示している。すべてのコードとアーティファクトをhttps://github.com/hyn0027/agent-symbolic-guardrailsでリリースします。

論文の概要: Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

関連論文リスト