Fugu-MT 論文翻訳(概要): Conjunctive Prompt Attacks in Multi-Agent LLM Systems

論文の概要: Conjunctive Prompt Attacks in Multi-Agent LLM Systems

arxiv url: http://arxiv.org/abs/2604.16543v1
Date: Fri, 17 Apr 2026 02:31:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 14:04:47.722457
Title: Conjunctive Prompt Attacks in Multi-Agent LLM Systems
Title（参考訳）: マルチエージェントLDMシステムにおける共役プロンプトアタック
Authors: Nokimul Hasan Arif, Qian Lou, Mengxin Zheng,
Abstract要約: エージェント間ルーティングは、単一エージェント評価が見逃すアタックサーフェスを生成する。本研究では,ユーザクエリ内のトリガーキーと,不正なリモートエージェント内の隠れ対向テンプレートのそれぞれが単独でベニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグナグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグナグニグニグニグニ
参考スコア（独自算出の注目度）: 16.735743806437487
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most LLM safety work studies single-agent models, but many real applications rely on multiple interacting agents. In these systems, prompt segmentation and inter-agent routing create attack surfaces that single-agent evaluations miss. We study \emph{conjunctive prompt attacks}, where a trigger key in the user query and a hidden adversarial template in one compromised remote agent each appear benign alone but activate harmful behavior when routing brings them together. We consider an attacker who changes neither model weights nor the client agent and instead controls only trigger placement and template insertion. Across star, chain, and DAG topologies, routing-aware optimization substantially increases attack success over non-optimized baselines while keeping false activations low. Existing defenses, including PromptGuard, Llama-Guard variants, and system-level controls such as tool restrictions, do not reliably stop the attack because no single component appears malicious in isolation. These results expose a structural vulnerability in agentic LLM pipelines and motivate defenses that reason over routing and cross-agent composition. Code is available at https://github.com/UCF-ML-Research/ConjunctiveAgents.
Abstract（参考訳）: LLMの安全作業の多くは単一エージェントモデルを研究するが、実際の多くのアプリケーションは複数の相互作用エージェントに依存している。これらのシステムでは、プロンプトセグメンテーションとエージェント間ルーティングは、単一エージェント評価が見逃すアタックサーフェスを生成する。ユーザクエリ内のトリガーキーと,一方の妥協されたリモートエージェント内の隠れ対向テンプレートがそれぞれベニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグニグナグニグニグニグニグニグニグニグニグナグニグニグニグニグニグニグニグニグニグニグニグニグニグナグニグニグナグニグニグニグニグニグモデルウェイトもクライアントエージェントも変更せず、代わりにトリガー配置とテンプレート挿入のみを制御するアタッカーについて検討する。スター、チェーン、DAGトポロジを越えて、ルーティング対応最適化は、偽のアクティベーションを低く保ちながら、最適化されていないベースラインに対する攻撃成功を大幅に増大させる。 PromptGuard、Llama-Guard亜種、ツール制限のようなシステムレベルのコントロールなど、既存の防御は、単独で悪意のあるコンポーネントが存在しないため、確実に攻撃を止めることはできない。これらの結果は、エージェントLLMパイプラインの構造的脆弱性を明らかにし、ルーティングとクロスエージェント合成を理由とする防御を動機付けている。コードはhttps://github.com/UCF-ML-Research/ConjunctiveAgentsで入手できる。

論文の概要: Conjunctive Prompt Attacks in Multi-Agent LLM Systems

関連論文リスト