LLM Agents can Autonomously Hack Websites
- URL: http://arxiv.org/abs/2402.06664v3
- Date: Fri, 16 Feb 2024 04:02:51 GMT
- Title: LLM Agents can Autonomously Hack Websites
- Authors: Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang
- Abstract summary: We show that large language models (LLMs) can function autonomously as agents.
In this work, we show that LLM agents can autonomously hack websites.
We also show that GPT-4 is capable of autonomously finding vulnerabilities in websites in the wild.
- Score: 3.5248694676821484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, large language models (LLMs) have become increasingly
capable and can now interact with tools (i.e., call functions), read documents,
and recursively call themselves. As a result, these LLMs can now function
autonomously as agents. With the rise in capabilities of these agents, recent
work has speculated on how LLM agents would affect cybersecurity. However, not
much is known about the offensive capabilities of LLM agents.
In this work, we show that LLM agents can autonomously hack websites,
performing tasks as complex as blind database schema extraction and SQL
injections without human feedback. Importantly, the agent does not need to know
the vulnerability beforehand. This capability is uniquely enabled by frontier
models that are highly capable of tool use and leveraging extended context.
Namely, we show that GPT-4 is capable of such hacks, but existing open-source
models are not. Finally, we show that GPT-4 is capable of autonomously finding
vulnerabilities in websites in the wild. Our findings raise questions about the
widespread deployment of LLMs.
Related papers
- GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning [79.07152553060601]
Existing methods for enhancing the safety of large language models (LLMs) are not directly transferable to LLM-powered agents.
We propose GuardAgent, the first LLM agent as a guardrail to other LLM agents.
GuardAgent comprises two steps: 1) creating a task plan by analyzing the provided guard requests, and 2) generating guardrail code based on the task plan and executing the code by calling APIs or using external engines.
arXiv Detail & Related papers (2024-06-13T14:49:26Z) - BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents [26.057916556444333]
We show that such methods are vulnerable to our proposed backdoor attacks named BadAgent.
Our proposed attack methods are extremely robust even after fine-tuning on trustworthy data.
arXiv Detail & Related papers (2024-06-05T07:14:28Z) - Teams of LLM Agents can Exploit Zero-Day Vulnerabilities [3.2855317710497625]
We show that teams of LLM agents can exploit real-world, zero-day vulnerabilities.
We introduce HPTSA, a system of agents with a planning agent that can launch subagents.
We construct a benchmark of 15 real-world vulnerabilities and show that our team of agents improve over prior work by up to 4.5$times$.
arXiv Detail & Related papers (2024-06-02T16:25:26Z) - Backdoor Removal for Generative Large Language Models [42.19147076519423]
generative large language models (LLMs) dominate various Natural Language Processing (NLP) tasks from understanding to reasoning.
A malicious adversary may publish poisoned data online and conduct backdoor attacks on the victim LLMs pre-trained on the poisoned data.
We present Simulate and Eliminate (SANDE) to erase the undesired backdoored mappings for generative LLMs.
arXiv Detail & Related papers (2024-05-13T11:53:42Z) - LLM Agents can Autonomously Exploit One-day Vulnerabilities [2.3999111269325266]
We show that LLM agents can autonomously exploit one-day vulnerabilities in real-world systems.
Our GPT-4 agent requires the CVE description for high performance.
Our findings raise questions around the widespread deployment of highly capable LLM agents.
arXiv Detail & Related papers (2024-04-11T22:07:19Z) - EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [65.38474102119181]
We propose EnvGen, a framework to adaptively create training environments.
We train a small RL agent in a mixture of the original and LLM-generated environments.
We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster.
arXiv Detail & Related papers (2024-03-18T17:51:16Z) - Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based
Agents [50.034049716274005]
We take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents.
We first formulate a general framework of agent backdoor attacks, then we present a thorough analysis on the different forms of agent backdoor attacks.
We propose the corresponding data poisoning mechanisms to implement the above variations of agent backdoor attacks on two typical agent tasks.
arXiv Detail & Related papers (2024-02-17T06:48:45Z) - GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension [81.44231422624055]
A growing area of research focuses on Large Language Models (LLMs) equipped with external tools capable of performing diverse tasks.
In this paper, we introduce GitAgent, an agent capable of achieving the autonomous tool extension from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z) - Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models.
We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.