Fugu-MT 論文翻訳(概要): Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

論文の概要: Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

arxiv url: http://arxiv.org/abs/2605.23989v1
Date: Sun, 17 May 2026 10:26:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:17.442947
Title: Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security
Title（参考訳）: 信頼できるエージェントAIを目指して - 安全性、堅牢性、プライバシ、システムセキュリティに関する包括的な調査
Authors: Jinhu Qi, Muzhi Li, Jiahong Liu, Yuqin Shu, Dianzhi Yu, Shicheng Ma, Wenqian Cui, Yiyang Zhao, Yiyi Chen, Ruoxi Jiang, Irwin King, Zenglin Xu,
Abstract要約: エージェントAIシステムは、複雑なタスクを自律的に実行するが、その多段階の軌道には、信頼性に挑戦する新たな障害モードが導入されている。この調査では、リスクの高いデプロイメントに不可欠な2つのコアディメンションを通じて、信頼できるエージェントAIを精査する。各次元について、重要な概念を明確にし、エージェントワークフローに沿ってリスクが発生する場所を特定し、ステージ目標の緩和戦略を要約する。
参考スコア（独自算出の注目度）: 57.35851886874902
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Agentic AI systems -- Large Language Models (LLMs) augmented with planning, tool use, memory, and long-horizon interactions -- can execute complex tasks autonomously, but their multi-step trajectories introduce new failure modes that challenge trustworthiness. This survey provides a focused examination of trustworthy agentic AI through two core dimensions that are critical for high-risk deployments: Safety and Robustness, and Privacy and System Security. For each dimension, we clarify key concepts, identify where risks emerge along the agent workflow, and summarize stage-targeted mitigation strategies. Other trustworthiness aspects (value alignment, transparency, fairness, and accountability) are discussed as relevant context rather than parallel chapters. To support consistent comparison and deployment decisions, we consolidate evaluation into a unified metrics-and-benchmarks hub, emphasizing both outcome and process signals (e.g., constraint violations, trace completeness, and adversarial success rates) and offering scenario-to-metric guidance for release gating. We conclude by outlining open challenges such as self-evolving agents, runtime monitoring and verification, privacy-preserving personalization, and the trust-utility trade-off, and present a case study of real-world security failures in open-source agentic systems. Our goal is to serve as a practical reference for researchers and practitioners building trustworthy agentic systems in high-stakes environments.
Abstract（参考訳）: エージェントAIシステム – 計画、ツール使用、メモリ、長期にわたるインタラクションを拡張した大規模言語モデル(LLM) – は、複雑なタスクを自律的に実行することができるが、そのマルチステップの軌道には、信頼性に挑戦する新たな障害モードが導入されている。この調査は、高リスクデプロイメントにおいて重要な2つの中核的な側面(安全性とロバスト性、プライバシとシステムセキュリティ)を通じて、信頼できるエージェントAIを精査する。各次元について、重要な概念を明確にし、エージェントワークフローに沿ってリスクが発生する場所を特定し、ステージ目標の緩和戦略を要約する。その他の信頼性の側面(価値の整合性、透明性、公平性、説明責任)は、平行した章ではなく、関連する文脈として議論される。一貫性のある比較とデプロイメントの決定をサポートするため、評価を統合されたメトリクスとベンチマークハブに統合し、結果とプロセスの信号(例えば、制約違反、トレース完全性、敵的成功率)を強調し、リリースゲーティングのためのシナリオ・ツー・メトリックのガイダンスを提供する。我々は、セルフ進化エージェント、ランタイム監視と検証、プライバシー保護のパーソナライゼーション、信頼ユーティリティトレードオフといったオープンな課題を概説し、オープンソースのエージェントシステムにおける現実のセキュリティ障害のケーススタディを示す。私たちのゴールは、高所で信頼できるエージェントシステムを構築する研究者や実践者にとって、実践的な参考となることです。

論文の概要: Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

関連論文リスト