Fugu-MT 論文翻訳(概要): Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks

論文の概要: Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks

arxiv url: http://arxiv.org/abs/2602.05066v1
Date: Wed, 04 Feb 2026 21:38:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-06 18:49:08.629586
Title: Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks
Title（参考訳）: Agent-as-a-Proxy攻撃によるAI制御プロトコルのバイパス
Authors: Jafar Isbarov, Murat Kantarcioglu,
Abstract要約: 現在の防御は、エージェントのChain-of-Thought(CoT)とツール使用アクションを共同で評価し、ユーザの意図との整合性を保証する監視プロトコルに依存している。これらの監視ベースの防御は、新しいエージェント・アズ・ア・プロキシ・ア・プロキシ・アタックによってバイパス可能であることを実証する。以上の結果から,現在の監視型エージェント防御は,モデルスケールによらず根本的に脆弱であることが示唆された。
参考スコア（独自算出の注目度）: 12.356708678431183
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As AI agents automate critical workloads, they remain vulnerable to indirect prompt injection (IPI) attacks. Current defenses rely on monitoring protocols that jointly evaluate an agent's Chain-of-Thought (CoT) and tool-use actions to ensure alignment with user intent. We demonstrate that these monitoring-based defenses can be bypassed via a novel Agent-as-a-Proxy attack, where prompt injection attacks treat the agent as a delivery mechanism, bypassing both agent and monitor simultaneously. While prior work on scalable oversight has focused on whether small monitors can supervise large agents, we show that even frontier-scale monitors are vulnerable. Large-scale monitoring models like Qwen2.5-72B can be bypassed by agents with similar capabilities, such as GPT-4o mini and Llama-3.1-70B. On the AgentDojo benchmark, we achieve a high attack success rate against AlignmentCheck and Extract-and-Evaluate monitors under diverse monitoring LLMs. Our findings suggest current monitoring-based agentic defenses are fundamentally fragile regardless of model scale.
Abstract（参考訳）: AIエージェントは重要なワークロードを自動化するため、間接的プロンプトインジェクション(IPI)攻撃に対して脆弱である。現在の防御は、エージェントのChain-of-Thought(CoT)とツール使用アクションを共同で評価し、ユーザの意図との整合性を保証する監視プロトコルに依存している。エージェント・アズ・ア・プロキシ攻撃(Agen-as-a-Proxy attack)により、エージェントをデリバリー機構として処理し、エージェントをバイパスし、同時に監視する。これまでは、小さなモニターが大きなエージェントを監督できるかどうかに重点を置いてきたが、フロンティアスケールのモニターでさえも脆弱であることを示す。 Qwen2.5-72Bのような大規模監視モデルは、GPT-4o miniやLlama-3.1-70Bのような類似の能力を持つエージェントによってバイパスされる。 AgentDojoのベンチマークでは、多様なモニタリング LLM の下でAlignmentCheck と Extract-and-Evaluate モニタに対して高い攻撃成功率を達成する。以上の結果から,現在の監視型エージェント防御は,モデルスケールによらず根本的に脆弱であることが示唆された。

論文の概要: Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks

関連論文リスト