Multi-Agent Systems Execute Arbitrary Malicious Code
- URL: http://arxiv.org/abs/2503.12188v1
- Date: Sat, 15 Mar 2025 16:16:08 GMT
- Title: Multi-Agent Systems Execute Arbitrary Malicious Code
- Authors: Harold Triedman, Rishi Jha, Vitaly Shmatikov,
- Abstract summary: We show that adversarial content can hijack control and communication within the system to invoke unsafe agents and functionalities.<n>We show that control-flow hijacking attacks succeed even if the individual agents are not susceptible to direct or indirect prompt injection.
- Score: 9.200635465485067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent systems coordinate LLM-based agents to perform tasks on users' behalf. In real-world applications, multi-agent systems will inevitably interact with untrusted inputs, such as malicious Web content, files, email attachments, etc. Using several recently proposed multi-agent frameworks as concrete examples, we demonstrate that adversarial content can hijack control and communication within the system to invoke unsafe agents and functionalities. This results in a complete security breach, up to execution of arbitrary malicious code on the user's device and/or exfiltration of sensitive data from the user's containerized environment. We show that control-flow hijacking attacks succeed even if the individual agents are not susceptible to direct or indirect prompt injection, and even if they refuse to perform harmful actions.
Related papers
- SAGA: A Security Architecture for Governing AI Agentic Systems [13.106925341037046]
Large Language Model (LLM)-based agents increasingly interact, collaborate, and delegate tasks to one another autonomously with minimal human interaction.
Industry guidelines for agentic system governance emphasize the need for users to maintain comprehensive control over their agents.
We propose SAGA, a Security Architecture for Governing Agentic systems, that offers user oversight over their agents' lifecycle.
arXiv Detail & Related papers (2025-04-27T23:10:00Z) - Manipulating Multimodal Agents via Cross-Modal Prompt Injection [34.35145839873915]
We identify a critical yet previously overlooked security vulnerability in multimodal agents.
We propose CrossInject, a novel attack framework in which attackers embed adversarial perturbations across multiple modalities.
Our method outperforms existing injection attacks, achieving at least a +26.4% increase in attack success rates.
arXiv Detail & Related papers (2025-04-19T16:28:03Z) - DoomArena: A framework for Testing AI Agents Against Evolving Security Threats [84.94654617852322]
We present DoomArena, a security evaluation framework for AI agents.
It is a plug-in framework and integrates easily into realistic agentic frameworks.
It is modular and decouples the development of attacks from details of the environment in which the agent is deployed.
arXiv Detail & Related papers (2025-04-18T20:36:10Z) - Progent: Programmable Privilege Control for LLM Agents [46.49787947705293]
We introduce Progent, the first privilege control mechanism for LLM agents.
At its core is a domain-specific language for flexibly expressing privilege control policies applied during agent execution.
This enables agent developers and users to craft suitable policies for their specific use cases and enforce them deterministically to guarantee security.
arXiv Detail & Related papers (2025-04-16T01:58:40Z) - Defeating Prompt Injections by Design [79.00910871948787]
CaMeL is a robust defense that creates a protective system layer around the Large Language Models (LLMs)
To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query.
We demonstrate effectiveness of CaMeL by solving $67%$ of tasks with provable security in AgentDojo [NeurIPS 2024], a recent agentic security benchmark.
arXiv Detail & Related papers (2025-03-24T15:54:10Z) - Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach [9.483655213280738]
This paper presents a novel approach to evaluating the security of large language models (LLMs)<n>We define prompt leakage as a critical threat to secure LLM deployment.<n>We implement a multi-agent system where cooperative agents are tasked with probing and exploiting the target LLM to elicit its prompt.
arXiv Detail & Related papers (2025-02-18T08:17:32Z) - AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents [84.96249955105777]
LLM agents may pose a greater risk if misused, but their robustness remains underexplored.
We propose a new benchmark called AgentHarm to facilitate research on LLM agent misuse.
We find leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking.
arXiv Detail & Related papers (2024-10-11T17:39:22Z) - Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems [6.480532634073257]
We introduce Prompt Infection, a novel attack where malicious prompts self-replicate across interconnected agents.
This attack poses severe threats, including data theft, scams, misinformation, and system-wide disruption.
To address this, we propose LLM Tagging, a defense mechanism that, when combined with existing safeguards, significantly mitigates infection spread.
arXiv Detail & Related papers (2024-10-09T11:01:29Z) - On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents [58.79302663733703]
Large language model-based multi-agent systems have shown great abilities across various tasks due to the collaboration of expert agents.<n>However, the impact of clumsy or even malicious agents, on the overall performance of the system remains underexplored.<n>This paper investigates what is the resilience of various system structures under faulty agents.
arXiv Detail & Related papers (2024-08-02T03:25:20Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents [3.5248694676821484]
We introduce InjecAgent, a benchmark designed to assess the vulnerability of tool-integrated LLM agents to IPI attacks.
InjecAgent comprises 1,054 test cases covering 17 different user tools and 62 attacker tools.
We show that agents are vulnerable to IPI attacks, with ReAct-prompted GPT-4 vulnerable to attacks 24% of the time.
arXiv Detail & Related papers (2024-03-05T06:21:45Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.