Fugu-MT 論文翻訳(概要): Automating Agent Hijacking via Structural Template Injection

論文の概要: Automating Agent Hijacking via Structural Template Injection

arxiv url: http://arxiv.org/abs/2602.16958v1
Date: Wed, 18 Feb 2026 23:52:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-20 15:21:28.523841
Title: Automating Agent Hijacking via Structural Template Injection
Title（参考訳）: 構造テンプレート注入による自動エージェントハイジャック
Authors: Xinhao Deng, Jiaqing Wu, Miao Chen, Yue Xiao, Ke Xu, Qi Li,
Abstract要約: エージェントハイジャックは、Large Language Model (LLM)エコシステムにとって重要な脅威であり、悪意のある命令を検索されたコンテンツに注入することで、敵が実行を操作できるようにする。 LLMエージェントの基本的構造機構をターゲットにした自動エージェントハイジャックフレームワークPhantomを提案する。最適化されたテンプレートを検索されたコンテキストに注入することにより、ロールの混乱を誘発し、インジェクトされたコンテンツを正規のユーザ命令や以前のツール出力と誤解釈させる。
参考スコア（独自算出の注目度）: 18.856564341900555
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which often yields low attack success rates and limited transferability to closed-source commercial models. In this paper, we propose Phantom, an automated agent hijacking framework built upon Structured Template Injection that targets the fundamental architectural mechanisms of LLM agents. Our key insight is that agents rely on specific chat template tokens to separate system, user, assistant, and tool instructions. By injecting optimized structured templates into the retrieved context, we induce role confusion and cause the agent to misinterpret the injected content as legitimate user instructions or prior tool outputs. To enhance attack transferability against black-box agents, Phantom introduces a novel attack template search framework. We first perform multi-level template augmentation to increase structural diversity and then train a Template Autoencoder (TAE) to embed discrete templates into a continuous, searchable latent space. Subsequently, we apply Bayesian optimization to efficiently identify optimal adversarial vectors that are decoded into high-potency structured templates. Extensive experiments on Qwen, GPT, and Gemini demonstrate that our framework significantly outperforms existing baselines in both Attack Success Rate (ASR) and query efficiency. Moreover, we identified over 70 vulnerabilities in real-world commercial products that have been confirmed by vendors, underscoring the practical severity of structured template-based hijacking and providing an empirical foundation for securing next-generation agentic systems.
Abstract（参考訳）: OWASPによってLarge Language Model (LLM)エコシステムに対する重要な脅威として強調されたエージェントハイジャックは、悪意のある命令を検索されたコンテンツに注入することで、敵が実行を操作できるようにする。既存の攻撃のほとんどは手作業によるセマンティクス駆動のプロンプト操作に依存しており、攻撃の成功率が低く、クローズドソースの商用モデルへの転送性が制限されることが多い。本稿では,構造化テンプレートインジェクションをベースとした自動エージェントハイジャックフレームワークPhantomを提案する。私たちの重要な洞察は、エージェントが特定のチャットテンプレートトークンを使用して、システム、ユーザ、アシスタント、ツール命令を分離することです。最適化されたテンプレートを検索されたコンテキストに注入することにより、ロールの混乱を誘発し、インジェクトされたコンテンツを正規のユーザ命令や以前のツール出力と誤解釈させる。ブラックボックスエージェントに対する攻撃伝達性を高めるため、Phantomは新たな攻撃テンプレート検索フレームワークを導入した。まず、構造的多様性を高めるためにマルチレベルテンプレート拡張を行い、次にテンプレートオートエンコーダ(TAE)をトレーニングして、個別のテンプレートを連続して検索可能な潜在空間に埋め込む。次にベイズ最適化を適用し、高能率構造化テンプレートにデコードされた最適逆ベクトルを効率的に同定する。 Qwen, GPT, Geminiの大規模な実験により、我々のフレームワークはアタック成功率(ASR)とクエリ効率の両方において、既存のベースラインを大幅に上回っていることが実証された。さらに、ベンダーが確認した現実世界の商用製品に70以上の脆弱性を特定し、構造化テンプレートベースのハイジャックの実用的深刻さを強調し、次世代のエージェントシステムを保護するための実証的な基盤を提供する。

論文の概要: Automating Agent Hijacking via Structural Template Injection

関連論文リスト