Fugu-MT 論文翻訳(概要): ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

論文の概要: ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

arxiv url: http://arxiv.org/abs/2605.03378v1
Date: Tue, 05 May 2026 05:37:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-06 19:35:43.780217
Title: ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection
Title（参考訳）: ARGUS: 文脈認識型プロンプト注入に対するLDMエージェントの防御
Authors: Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie, Jiongchi Yu, Jia Liu,
Abstract要約: AgentLureは、コンテキスト依存タスクとコンテキスト認識インジェクション攻撃をキャプチャするベンチマークである。この制限に対処するため,我々は,LDMエージェントの事前判定監査を実施する防衛機構であるARGUSを提案する。
参考スコア（独自算出の注目度）: 28.414099578635373
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of Large Language Model (LLM) agents, augmented with tool use, skills, and external knowledge, has introduced new security risks. Among them, prompt injection attacks, where adversaries embed malicious instructions into the agent workflow, have emerged as the primary threat. However, existing benchmarks and defenses are fundamentally limited as they assume context-insensitive settings in which the agent works under a fully specified user instruction, and the attacks are straightforward and context-independent. As a result, they fail to capture real-world deployments where agent behavior usually depends on dynamic context, not just the user prompt, and adversaries can adapt their attacks to different context. Similarly, existing defenses built on this narrow threat model overlook the nature of real-world agent delegation. In this paper, we present AgentLure, a benchmark that captures context-dependent tasks and context-aware prompt injection attacks. AgentLure spans four agentic domains and eight attack vectors across diverse attack surfaces. Our evaluation shows that existing defenses often struggle in this setting, yielding poor performance against such attacks in agentic systems. To address this limitation, we propose ARGUS, a defense mechanism that enforces provenance-aware decision auditing for LLM agents. ARGUS constructs an influence provenance graph to track how untrusted context propagates into agent decisions and verify whether a decision is justified by trustworthy evidence before execution. Our evaluation shows ARGUS reduces attack success rate to 3.8% while preserving 87.5% task utility, significantly outperforming existing defenses and remaining robust against adaptive white-box adversaries.
Abstract（参考訳）: ツールの使用、スキル、外部知識が強化されたLarge Language Model (LLM)エージェントの台頭により、新たなセキュリティリスクが導入された。その中では、エージェントワークフローに悪意のある命令を埋め込むプロンプトインジェクション攻撃が主要な脅威として浮上している。しかし、既存のベンチマークとディフェンスは、エージェントが完全に指定されたユーザー命令の下で機能するコンテキスト非感受性の設定を前提としており、攻撃は単純でコンテキスト非依存である。その結果、エージェントの振る舞いがユーザプロンプトだけでなく動的コンテキストに依存する実際のデプロイメントをキャプチャできず、敵は異なるコンテキストにアタックを適用することができる。同様に、この狭い脅威モデルの上に構築された既存の防御は、現実世界のエージェントデリゲートの性質を見落としている。本稿では,文脈依存型タスクと文脈認識型インジェクション攻撃をキャプチャするベンチマークであるAgentLureを提案する。 AgentLureは4つのエージェントドメインと8つの攻撃ベクトルを多様な攻撃面に分散する。我々の評価は、既存の防衛は、しばしばこの環境で苦戦し、エージェントシステムにおけるこのような攻撃に対する性能が劣っていることを示している。この制限に対処するため,我々は,LDMエージェントの事前判定監査を実施する防衛機構であるARGUSを提案する。 ARGUSは、信頼できない文脈がエージェントの判断にどのように伝播するかを追跡し、決定が実行前に信頼できる証拠によって正当化されるかどうかを検証するために、影響の証明グラフを構築する。評価の結果、ARGUSは87.5%のタスクユーティリティを保ちながら攻撃成功率を3.8%に低下させ、既存の防御を著しく上回り、適応的ホワイトボックス敵に対する堅牢性を維持した。

論文の概要: ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

関連論文リスト