Related papers: Teams of LLM Agents can Exploit Zero-Day Vulnerabilities

Teams of LLM Agents can Exploit Zero-Day Vulnerabilities

URL: http://arxiv.org/abs/2406.01637v1
Date: Sun, 2 Jun 2024 16:25:26 GMT
Title: Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Authors: Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang,
Abstract summary: We show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. We introduce HPTSA, a system of agents with a planning agent that can launch subagents. We construct a benchmark of 15 real-world vulnerabilities and show that our team of agents improve over prior work by up to 4.5$times$.
Score: 3.2855317710497625
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities). In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Prior agents struggle with exploring many different vulnerabilities and long-range planning when used alone. To resolve this, we introduce HPTSA, a system of agents with a planning agent that can launch subagents. The planning agent explores the system and determines which subagents to call, resolving long-term planning issues when trying different vulnerabilities. We construct a benchmark of 15 real-world vulnerabilities and show that our team of agents improve over prior work by up to 4.5$\times$.

Related papers

The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover [0.18472148461613155]
Large Language Model (LLM) agents and multi-agent systems introduce unprecedented security vulnerabilities.<n>This paper presents a comprehensive evaluation of the security of LLMs used as reasoning engines within autonomous agents.<n>We focus on how different attack surfaces and trust boundaries can be leveraged to orchestrate such takeovers.
arXiv Detail & Related papers (2025-07-09T13:54:58Z)
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z)
Kaleidoscopic Teaming in Multi Agent Simulations [75.47388708240042]
We argue that existing red teaming or safety evaluation frameworks fall short in evaluating safety risks in complex behaviors, thought processes and actions taken by agents.<n>We introduce new in-context optimization techniques that can be used in our kaleidoscopic teaming framework to generate better scenarios for safety analysis.<n>We present appropriate metrics that can be used along with our framework to measure safety of agents.
arXiv Detail & Related papers (2025-06-20T23:37:17Z)
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z)
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems [50.29939179830491]
Failure attribution in LLM multi-agent systems remains underexplored and labor-intensive. We develop and evaluate three automated failure attribution methods, summarizing their corresponding pros and cons. The best method achieves 53.5% accuracy in identifying failure-responsible agents but only 14.2% in pinpointing failure steps.
arXiv Detail & Related papers (2025-04-30T23:09:44Z)
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks [88.84977282952602]
A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs) In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents. We conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities.
arXiv Detail & Related papers (2025-02-12T17:19:36Z)
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents [84.96249955105777]
LLM agents may pose a greater risk if misused, but their robustness remains underexplored. We propose a new benchmark called AgentHarm to facilitate research on LLM agent misuse. We find leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking.
arXiv Detail & Related papers (2024-10-11T17:39:22Z)
On the Resilience of Multi-Agent Systems with Malicious Agents [58.79302663733702]
This paper investigates what is the resilience of multi-agent system structures under malicious agents. We devise two methods, AutoTransform and AutoInject, to transform any agent into a malicious one. We show that two defense methods, introducing a mechanism for each agent to challenge others' outputs, or an additional agent to review and correct messages, can enhance system resilience.
arXiv Detail & Related papers (2024-08-02T03:25:20Z)
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification [35.16099878559559]
Large language models (LLMs) have experienced significant development and are being deployed in real-world applications. We introduce a new type of attack that causes malfunctions by misleading the agent into executing repetitive or irrelevant actions. Our experiments reveal that these attacks can induce failure rates exceeding 80% in multiple scenarios.
arXiv Detail & Related papers (2024-07-30T14:35:31Z)
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning [79.07152553060601]
Existing methods for enhancing the safety of large language models (LLMs) are not directly transferable to LLM-powered agents. We propose GuardAgent, the first LLM agent as a guardrail to other LLM agents. GuardAgent comprises two steps: 1) creating a task plan by analyzing the provided guard requests, and 2) generating guardrail code based on the task plan and executing the code by calling APIs or using external engines.
arXiv Detail & Related papers (2024-06-13T14:49:26Z)
Security of AI Agents [5.468745160706382]
We identify and describe potential vulnerabilities in AI agents in detail from a system security perspective. We introduce defense mechanisms corresponding to each vulnerability with design and experiments to evaluate their viability. This paper contextualizes the security issues in the current development of AI agents and delineates methods to make AI agents safer and more reliable.
arXiv Detail & Related papers (2024-06-12T23:16:45Z)
LLM Agents can Autonomously Exploit One-day Vulnerabilities [2.3999111269325266]
We show that LLM agents can autonomously exploit one-day vulnerabilities in real-world systems. Our GPT-4 agent requires the CVE description for high performance. Our findings raise questions around the widespread deployment of highly capable LLM agents.
arXiv Detail & Related papers (2024-04-11T22:07:19Z)
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents [47.219047422240145]
We take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents. Specifically, compared with traditional backdoor attacks on LLMs that are only able to manipulate the user inputs and model outputs, agent backdoor attacks exhibit more diverse and covert forms.
arXiv Detail & Related papers (2024-02-17T06:48:45Z)
LLM Agents can Autonomously Hack Websites [3.5248694676821484]
We show that large language models (LLMs) can function autonomously as agents. In this work, we show that LLM agents can autonomously hack websites. We also show that GPT-4 is capable of autonomously finding vulnerabilities in websites in the wild.
arXiv Detail & Related papers (2024-02-06T14:46:08Z)
TrustAgent: Towards Safe and Trustworthy LLM-based Agents [50.33549510615024]
This paper presents an Agent-Constitution-based agent framework, TrustAgent, with a focus on improving the LLM-based agent safety. The proposed framework ensures strict adherence to the Agent Constitution through three strategic components: pre-planning strategy which injects safety knowledge to the model before plan generation, in-planning strategy which enhances safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection.
arXiv Detail & Related papers (2024-02-02T17:26:23Z)
The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI) We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.