LLM Agents can Autonomously Hack Websites
- URL: http://arxiv.org/abs/2402.06664v3
- Date: Fri, 16 Feb 2024 04:02:51 GMT
- Title: LLM Agents can Autonomously Hack Websites
- Authors: Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang
- Abstract summary: We show that large language models (LLMs) can function autonomously as agents.
In this work, we show that LLM agents can autonomously hack websites.
We also show that GPT-4 is capable of autonomously finding vulnerabilities in websites in the wild.
- Score: 3.5248694676821484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, large language models (LLMs) have become increasingly
capable and can now interact with tools (i.e., call functions), read documents,
and recursively call themselves. As a result, these LLMs can now function
autonomously as agents. With the rise in capabilities of these agents, recent
work has speculated on how LLM agents would affect cybersecurity. However, not
much is known about the offensive capabilities of LLM agents.
In this work, we show that LLM agents can autonomously hack websites,
performing tasks as complex as blind database schema extraction and SQL
injections without human feedback. Importantly, the agent does not need to know
the vulnerability beforehand. This capability is uniquely enabled by frontier
models that are highly capable of tool use and leveraging extended context.
Namely, we show that GPT-4 is capable of such hacks, but existing open-source
models are not. Finally, we show that GPT-4 is capable of autonomously finding
vulnerabilities in websites in the wild. Our findings raise questions about the
widespread deployment of LLMs.
Related papers
- Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks [88.84977282952602]
A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs)
In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents.
We conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities.
arXiv Detail & Related papers (2025-02-12T17:19:36Z) - AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents [84.96249955105777]
LLM agents may pose a greater risk if misused, but their robustness remains underexplored.
We propose a new benchmark called AgentHarm to facilitate research on LLM agent misuse.
We find leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking.
arXiv Detail & Related papers (2024-10-11T17:39:22Z) - Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments [0.5735035463793008]
Large Language Models (LLMs) have shown remarkable potential across various domains, including cybersecurity.
We present Hackphyr, a locally fine-tuned LLM to be used as a red-team agent within network security environments.
arXiv Detail & Related papers (2024-09-17T15:28:25Z) - MEGen: Generative Backdoor in Large Language Models via Model Editing [56.46183024683885]
Large language models (LLMs) have demonstrated remarkable capabilities.
Their powerful generative abilities enable flexible responses based on various queries or instructions.
This paper proposes an editing-based generative backdoor, named MEGen, aiming to create a customized backdoor for NLP tasks with the least side effects.
arXiv Detail & Related papers (2024-08-20T10:44:29Z) - GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning [79.07152553060601]
Existing methods for enhancing the safety of large language models (LLMs) are not directly transferable to LLM-powered agents.
We propose GuardAgent, the first LLM agent as a guardrail to other LLM agents.
GuardAgent comprises two steps: 1) creating a task plan by analyzing the provided guard requests, and 2) generating guardrail code based on the task plan and executing the code by calling APIs or using external engines.
arXiv Detail & Related papers (2024-06-13T14:49:26Z) - BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents [26.057916556444333]
We show that such methods are vulnerable to our proposed backdoor attacks named BadAgent.
Our proposed attack methods are extremely robust even after fine-tuning on trustworthy data.
arXiv Detail & Related papers (2024-06-05T07:14:28Z) - Teams of LLM Agents can Exploit Zero-Day Vulnerabilities [3.2855317710497625]
We show that teams of LLM agents can exploit real-world, zero-day vulnerabilities.
We introduce HPTSA, a system of agents with a planning agent that can launch subagents.
We construct a benchmark of 15 real-world vulnerabilities and show that our team of agents improve over prior work by up to 4.5$times$.
arXiv Detail & Related papers (2024-06-02T16:25:26Z) - AGILE: A Novel Reinforcement Learning Framework of LLM Agents [7.982249117182315]
We introduce a novel reinforcement learning framework of LLM agents designed to perform complex conversational tasks with users.
The agent possesses capabilities beyond conversation, including reflection, tool usage, and expert consultation.
Our experiments show that AGILE agents based on 7B and 13B LLMs trained with PPO can outperform GPT-4 agents.
arXiv Detail & Related papers (2024-05-23T16:17:44Z) - LLM Agents can Autonomously Exploit One-day Vulnerabilities [2.3999111269325266]
We show that LLM agents can autonomously exploit one-day vulnerabilities in real-world systems.
Our GPT-4 agent requires the CVE description for high performance.
Our findings raise questions around the widespread deployment of highly capable LLM agents.
arXiv Detail & Related papers (2024-04-11T22:07:19Z) - EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [65.38474102119181]
We propose EnvGen, a framework to adaptively create training environments.
We train a small RL agent in a mixture of the original and LLM-generated environments.
We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster.
arXiv Detail & Related papers (2024-03-18T17:51:16Z) - Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models.
We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.