Fugu-MT 論文翻訳(概要): From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails

論文の概要: From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails

arxiv url: http://arxiv.org/abs/2606.14517v2
Date: Tue, 16 Jun 2026 09:28:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-17 15:01:46.711394
Title: From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails
Title（参考訳）: シールドからターゲットへ:LDMをベースとしたエージェントガードレールに対するサービス拒否攻撃
Authors: Yuguang Zhou, Xunguang Wang, Pingchuan Ma, Zhantong Xue, Zhaoyu Wang, Shuai Wang,
Abstract要約: LLMベースのガードレールは、自律エージェントの即時注入と脱獄攻撃に対する非常に効果的な防御として出現している。攻撃者は、製造されたデータを注入して、長期の推論ループでガードレールをトラップし、系統的なサービス拒否攻撃を実施できることを示す。 1つの有毒な文書が共有ガードレールのインフラを飽和させ、効果的に共同配置されたエージェントを飢えさせ、システム全体を麻痺させることが示される。
参考スコア（独自算出の注目度）: 9.514819678986488
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM-based guardrails have emerged as a highly effective defense against prompt injection and jailbreak attacks in autonomous agents. However, we reveal that the very reasoning and task-following capabilities enabling this protection introduce a novel vulnerability: attackers can inject crafted data to trap the guardrail in extended reasoning loops, effectuating a systematic denial-of-service (DoS) attack. To systematically expose this threat, we design a beam-search optimization framework that crafts natural-language payloads to maximize guardrail reasoning length, utilizing an LLM proposer guided by a strategy bank. Based on the observation of guardrail's schema-following nature, we also provide another attack framework driven by mechanism-aware structural mutations with less computational load. The attack efficacy is systematically evaluated in two parts. First, in standalone evaluations, the attack generalizes across diverse guardrail architectures, safety templates, and agent benchmarks. Payloads optimized on a single open-source surrogate successfully transfer to eight leading model backbones (e.g., Claude, GPT, Gemini, DeepSeek, and Qwen), achieving a 13--63$\times$ token amplification. Second, in end-to-end real-world agent deployments (web, desktop, code, and multi-agent systems), the attack reveals up to a 148$\times$ latency amplification. We show that a single poisoned document can saturate shared guardrail infrastructures, effectively starving co-located agents and paralyzing the entire system. By uncovering this availability flaw, our work underscores the urgent need to develop cost-bounded, reasoning-robust guardrails.
Abstract（参考訳）: LLMベースのガードレールは、自律エージェントの即時注入と脱獄攻撃に対する非常に効果的な防御として出現している。しかし、この保護を可能にする、まさに推論とタスクフォローの能力は、新しい脆弱性をもたらす。攻撃者は、工芸データを注入して、拡張された推論ループでガードレールをトラップし、系統的なDoS攻撃を実施できる。この脅威をシステマティックに露呈するために、戦略銀行が案内するLLMプロポーザルを用いて、自然言語ペイロードを用いてガードレール推論長を最大化するビーム探索最適化フレームワークを設計する。また、ガードレールのスキーマ追従特性を観察した結果、より少ない計算負荷で構造変異を認識できる別のアタック・フレームワークも提供した。攻撃効果は2つの部分で系統的に評価される。まず、スタンドアロンの評価において、攻撃はさまざまなガードレールアーキテクチャ、安全テンプレート、エージェントベンチマークにまたがって一般化される。単一のオープンソースサロゲートに最適化されたペイロードは、8つの主要なバックボーン(例えば、Claude、GPT、Gemini、DeepSeek、Qwen)への転送に成功し、13-63$\times$トークン増幅を実現した。第二に、エンドツーエンドのエージェントデプロイメント(Web、デスクトップ、コード、マルチエージェントシステム)では、攻撃は148$\times$遅延増幅となる。 1つの有毒な文書が共有ガードレールのインフラを飽和させ、効果的に共同配置されたエージェントを飢えさせ、システム全体を麻痺させることが示される。この可用性の欠陥を明らかにすることで、当社の作業は、コストバウンドでロバストなガードレールを開発する緊急の必要性を浮き彫りにします。

論文の概要: From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails

関連論文リスト