LLM Agents can Autonomously Hack Websites
- URL: http://arxiv.org/abs/2402.06664v3
- Date: Fri, 16 Feb 2024 04:02:51 GMT
- Title: LLM Agents can Autonomously Hack Websites
- Authors: Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang
- Abstract summary: We show that large language models (LLMs) can function autonomously as agents.
In this work, we show that LLM agents can autonomously hack websites.
We also show that GPT-4 is capable of autonomously finding vulnerabilities in websites in the wild.
- Score: 3.5248694676821484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, large language models (LLMs) have become increasingly
capable and can now interact with tools (i.e., call functions), read documents,
and recursively call themselves. As a result, these LLMs can now function
autonomously as agents. With the rise in capabilities of these agents, recent
work has speculated on how LLM agents would affect cybersecurity. However, not
much is known about the offensive capabilities of LLM agents.
In this work, we show that LLM agents can autonomously hack websites,
performing tasks as complex as blind database schema extraction and SQL
injections without human feedback. Importantly, the agent does not need to know
the vulnerability beforehand. This capability is uniquely enabled by frontier
models that are highly capable of tool use and leveraging extended context.
Namely, we show that GPT-4 is capable of such hacks, but existing open-source
models are not. Finally, we show that GPT-4 is capable of autonomously finding
vulnerabilities in websites in the wild. Our findings raise questions about the
widespread deployment of LLMs.
Related papers
- When LLMs Go Online: The Emerging Threat of Web-Enabled LLMs [26.2943792874156]
We investigate the risks associated with misuse of Large Language Models (LLMs) in cyberattacks involving personal data.
Specifically, we aim to understand how potent LLM agents can be when directed to conduct cyberattacks.
We examine three attack scenarios: the collection of Personally Identifiable Information (PII), the generation of impersonation posts, and the creation of spear-phishing emails.
arXiv Detail & Related papers (2024-10-18T16:16:34Z) - AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents [84.96249955105777]
LLM agents may pose a greater risk if misused, but their robustness remains underexplored.
We propose a new benchmark called AgentHarm to facilitate research on LLM agent misuse.
We find leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking.
arXiv Detail & Related papers (2024-10-11T17:39:22Z) - MEGen: Generative Backdoor in Large Language Models via Model Editing [56.46183024683885]
Large language models (LLMs) have demonstrated remarkable capabilities.
Their powerful generative abilities enable flexible responses based on various queries or instructions.
This paper proposes an editing-based generative backdoor, named MEGen, aiming to create a customized backdoor for NLP tasks with the least side effects.
arXiv Detail & Related papers (2024-08-20T10:44:29Z) - GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning [79.07152553060601]
Existing methods for enhancing the safety of large language models (LLMs) are not directly transferable to LLM-powered agents.
We propose GuardAgent, the first LLM agent as a guardrail to other LLM agents.
GuardAgent comprises two steps: 1) creating a task plan by analyzing the provided guard requests, and 2) generating guardrail code based on the task plan and executing the code by calling APIs or using external engines.
arXiv Detail & Related papers (2024-06-13T14:49:26Z) - BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents [26.057916556444333]
We show that such methods are vulnerable to our proposed backdoor attacks named BadAgent.
Our proposed attack methods are extremely robust even after fine-tuning on trustworthy data.
arXiv Detail & Related papers (2024-06-05T07:14:28Z) - Teams of LLM Agents can Exploit Zero-Day Vulnerabilities [3.2855317710497625]
We show that teams of LLM agents can exploit real-world, zero-day vulnerabilities.
We introduce HPTSA, a system of agents with a planning agent that can launch subagents.
We construct a benchmark of 15 real-world vulnerabilities and show that our team of agents improve over prior work by up to 4.5$times$.
arXiv Detail & Related papers (2024-06-02T16:25:26Z) - AGILE: A Novel Reinforcement Learning Framework of LLM Agents [7.982249117182315]
We introduce a novel reinforcement learning framework of LLM agents designed to perform complex conversational tasks with users.
The agent possesses capabilities beyond conversation, including reflection, tool usage, and expert consultation.
Our experiments show that AGILE agents based on 7B and 13B LLMs trained with PPO can outperform GPT-4 agents.
arXiv Detail & Related papers (2024-05-23T16:17:44Z) - LLM Agents can Autonomously Exploit One-day Vulnerabilities [2.3999111269325266]
We show that LLM agents can autonomously exploit one-day vulnerabilities in real-world systems.
Our GPT-4 agent requires the CVE description for high performance.
Our findings raise questions around the widespread deployment of highly capable LLM agents.
arXiv Detail & Related papers (2024-04-11T22:07:19Z) - EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [65.38474102119181]
We propose EnvGen, a framework to adaptively create training environments.
We train a small RL agent in a mixture of the original and LLM-generated environments.
We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster.
arXiv Detail & Related papers (2024-03-18T17:51:16Z) - Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents [47.219047422240145]
We take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents.
Specifically, compared with traditional backdoor attacks on LLMs that are only able to manipulate the user inputs and model outputs, agent backdoor attacks exhibit more diverse and covert forms.
arXiv Detail & Related papers (2024-02-17T06:48:45Z) - Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models.
We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.