Fugu-MT 論文翻訳(概要): VeriGrey: Greybox Agent Validation

論文の概要: VeriGrey: Greybox Agent Validation

arxiv url: http://arxiv.org/abs/2603.17639v1
Date: Wed, 18 Mar 2026 12:00:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.681978
Title: VeriGrey: Greybox Agent Validation
Title（参考訳）: VeriGrey: Greybox Agent Validation
Authors: Yuntong Zhang, Sungmin Kang, Ruijie Meng, Marcel Böhme, Abhik Roychoudhury,
Abstract要約: LLMエージェントの多様な動作を探索し,セキュリティリスクを明らかにするためのグレーボックスアプローチを提案する。我々のアプローチでは、VeriGreyはフィードバック関数として呼び出された一連のツールを使ってテストプロセスを動かします。また、広く使われているコーディングエージェントであるGemini CLIや、有名なOpenClawパーソナルアシスタントによる実世界のケーススタディも行っています。
参考スコア（独自算出の注目度）: 21.512659070355145
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Agentic AI has been a topic of great interest recently. A Large Language Model (LLM) agent involves one or more LLMs in the back-end. In the front end, it conducts autonomous decision-making by combining the LLM outputs with results obtained by invoking several external tools. The autonomous interactions with the external environment introduce critical security risks. In this paper, we present a grey-box approach to explore diverse behaviors and uncover security risks in LLM agents. Our approach VeriGrey uses the sequence of tools invoked as a feedback function to drive the testing process. This helps uncover infrequent but dangerous tool invocations that cause unexpected agent behavior. As mutation operators in the testing process, we mutate prompts to design pernicious injection prompts. This is carefully accomplished by linking the task of the agent to an injection task, so that the injection task becomes a necessary step of completing the agent functionality. Comparing our approach with a black-box baseline on the well-known AgentDojo benchmark, VeriGrey achieves 33% additional efficacy in finding indirect prompt injection vulnerabilities with a GPT-4.1 back-end. We also conduct real-world case studies with the widely used coding agent Gemini CLI, and the well-known OpenClaw personal assistant. VeriGrey finds prompts inducing several attack scenarios that could not be identified by black-box approaches. In OpenClaw, by constructing a conversation agent which employs mutational fuzz testing as needed, VeriGrey is able to discover malicious skill variants from 10 malicious skills (with 10/10= 100% success rate on the Kimi-K2.5 LLM backend, and 9/10= 90% success rate on Opus 4.6 LLM backend). This demonstrates the value of a dynamic approach like VeriGrey to test agents, and to eventually lead to an agent assurance framework.
Abstract（参考訳）: 最近、エージェントAIは大きな関心を集めている。 LLM(Large Language Model)エージェントは、バックエンドに1つ以上のLLMを含む。フロントエンドでは、LCM出力と複数の外部ツールを呼び出した結果を組み合わせることで、自律的な意思決定を行う。外部環境との自律的な相互作用は、重大なセキュリティリスクをもたらす。本稿では, LLMエージェントの多様な動作を探索し, セキュリティリスクを明らかにするため, グレーボックスアプローチを提案する。我々のアプローチでは、VeriGreyはフィードバック関数として呼び出された一連のツールを使ってテストプロセスを動かします。これは、予期しないエージェントの振る舞いを引き起こす頻繁だが危険なツール呼び出しを明らかにするのに役立ちます。テストプロセスにおける突然変異演算子として、悪質なインジェクションプロンプトを設計するプロンプトを変異させる。これは、エージェントのタスクをインジェクションタスクにリンクすることで、エージェント機能を完了させるために必要なステップとなるように、慎重に達成される。よく知られているAgentDojoベンチマークのブラックボックスベースラインと比較すると、VeriGreyはGPT-4.1バックエンドで間接的なインジェクション脆弱性を見つける上で、さらに33%の有効性を実現している。また、広く使われているコーディングエージェントであるGemini CLIや、有名なOpenClawパーソナルアシスタントによる実世界のケーススタディも行っています。 VeriGreyは、ブラックボックスアプローチでは特定できないいくつかの攻撃シナリオを誘発するプロンプトを見つける。 OpenClawでは、必要に応じて突然変異ファズテストを利用する会話エージェントを構築することで、VeriGreyは悪意のある10のスキルから悪意のあるスキルの変種を発見することができる(Kim-K2.5 LLMバックエンドでは10/10=100%、Opus 4.6 LLMバックエンドでは9/10=90%)。これは、テストエージェントに対するVeriGreyのような動的アプローチの価値を示し、最終的にはエージェント保証フレームワークにつながる。

論文の概要: VeriGrey: Greybox Agent Validation

関連論文リスト