Fugu-MT 論文翻訳(概要): CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

論文の概要: CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

arxiv url: http://arxiv.org/abs/2604.04060v1
Date: Sun, 05 Apr 2026 11:06:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:18.898576
Title: CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks
Title（参考訳）: CoopGuard: 複数ルート攻撃からLLMを保護するステートフルな協力エージェント
Authors: Siyuan Li, Zehao Liu, Xi Lin, Qinghua Mao, Yuliang Chen, Haoyu Li, Jun Wu, Jianhua Li, Xiu Su,
Abstract要約: CoopGuardは、協調エージェントに基づくステートフルなマルチラウンドLLM防衛フレームワークである。補助的なラウンドレベル戦略のために3つの特殊エージェントを雇用している。実験によると、CoopGuardは最先端の防御に対して攻撃の成功率を78.9%削減している。
参考スコア（独自算出の注目度）: 31.582651893036683
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Large Language Models (LLMs) are increasingly deployed in complex applications, their vulnerability to adversarial attacks raises urgent safety concerns, especially those evolving over multi-round interactions. Existing defenses are largely reactive and struggle to adapt as adversaries refine strategies across rounds. In this work, we propose CoopGuard , a stateful multi-round LLM defense framework based on cooperative agents that maintains and updates an internal defense state to counter evolving attacks. It employs three specialized agents (Deferring Agent, Tempting Agent, and Forensic Agent) for complementary round-level strategies, coordinated by System Agent, which conditions decisions on the evolving defense state (interaction history) and orchestrates agents over time. To evaluate evolving threats, we introduce the EMRA benchmark with 5,200 adversarial samples across 8 attack types, simulating progressively LLM multi-round attacks. Experiments show that CoopGuard reduces attack success rate by 78.9% over state-of-the-art defenses, while improving deceptive rate by 186% and reducing attack efficiency by 167.9%, offering a more comprehensive assessment of multi-round defense. These results demonstrate that CoopGuard provides robust protection for LLMs in multi-round adversarial scenarios.
Abstract（参考訳）: 大きな言語モデル(LLM)が複雑なアプリケーションにますますデプロイされるにつれて、敵攻撃に対する脆弱性は緊急の安全上の懸念を引き起こす。既存の防衛は、主に反応性があり、敵がラウンド全体にわたる戦略を洗練させるため、適応に苦慮している。本研究では,コープガード(CoopGuard)を提案する。コープガード(CoopGuard)は,内部の防衛状態を維持・更新し,進化する攻撃に対処する,協調エージェントに基づく,ステートフルな多ラウンドLDM防衛フレームワークである。システムエージェント(System Agent)によって調整された補助的なラウンドレベルの戦略には、3つの特別エージェント(Deferring Agent、Tempting Agent、Forensic Agent)が使用されている。進化する脅威を評価するため, EMRAベンチマークを導入し, 8種類の攻撃に対して5,200個の敵検体を用いて, 段階的にLLMマルチラウンド攻撃をシミュレートした。実験の結果、CoopGuardは最先端の防衛に対して78.9%の攻撃成功率を減らし、186%の偽装率の改善と167.9%の攻撃効率を低下させ、より包括的なマルチラウンド防衛の評価を提供する。これらの結果から,CoopGuardは多ラウンド対逆シナリオにおけるLLMの堅牢な保護を提供することが示された。

論文の概要: CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

関連論文リスト