Related papers: PentestAgent: Incorporating LLM Agents to Automated Penetration Testing

PentestAgent: Incorporating LLM Agents to Automated Penetration Testing

URL: http://arxiv.org/abs/2411.05185v1
Date: Thu, 07 Nov 2024 21:10:39 GMT
Title: PentestAgent: Incorporating LLM Agents to Automated Penetration Testing
Authors: Xiangmin Shen, Lingzhi Wang, Zhenyuan Li, Yan Chen, Wencheng Zhao, Dawei Sun, Jiashui Wang, Wei Ruan,
Abstract summary: Manual penetration testing is time-consuming and expensive. Recent advancements in large language models (LLMs) offer new opportunities for enhancing penetration testing. We propose PentestAgent, a novel LLM-based automated penetration testing framework.
Score: 6.815381197173165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Penetration testing is a critical technique for identifying security vulnerabilities, traditionally performed manually by skilled security specialists. This complex process involves gathering information about the target system, identifying entry points, exploiting the system, and reporting findings. Despite its effectiveness, manual penetration testing is time-consuming and expensive, often requiring significant expertise and resources that many organizations cannot afford. While automated penetration testing methods have been proposed, they often fall short in real-world applications due to limitations in flexibility, adaptability, and implementation. Recent advancements in large language models (LLMs) offer new opportunities for enhancing penetration testing through increased intelligence and automation. However, current LLM-based approaches still face significant challenges, including limited penetration testing knowledge and a lack of comprehensive automation capabilities. To address these gaps, we propose PentestAgent, a novel LLM-based automated penetration testing framework that leverages the power of LLMs and various LLM-based techniques like Retrieval Augmented Generation (RAG) to enhance penetration testing knowledge and automate various tasks. Our framework leverages multi-agent collaboration to automate intelligence gathering, vulnerability analysis, and exploitation stages, reducing manual intervention. We evaluate PentestAgent using a comprehensive benchmark, demonstrating superior performance in task completion and overall efficiency. This work significantly advances the practical applicability of automated penetration testing systems.

Related papers

Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems [50.29939179830491]
Failure attribution in LLM multi-agent systems remains underexplored and labor-intensive.<n>We develop and evaluate three automated failure attribution methods, summarizing their corresponding pros and cons.<n>The best method achieves 53.5% accuracy in identifying failure-responsible agents but only 14.2% in pinpointing failure steps.
arXiv Detail & Related papers (2025-04-30T23:09:44Z)
Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
We develop an adversarial reasoning approach to automatic jailbreaking via test-time computation. Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
arXiv Detail & Related papers (2025-02-03T18:59:01Z)
VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework [4.802551205178858]
Existing large language model (LLM)-assisted or automated penetration testing approaches often suffer from inefficiencies. VulnBot decomposes complex tasks into three specialized phases: reconnaissance, scanning, and exploitation. Key design features include role specialization, penetration path planning, inter-agent communication, and generative penetration behavior.
arXiv Detail & Related papers (2025-01-23T06:33:05Z)
The Potential of LLMs in Automating Software Testing: From Generation to Reporting [0.0]
Manual testing, while effective, can be time consuming and costly, leading to an increased demand for automated methods. Recent advancements in Large Language Models (LLMs) have significantly influenced software engineering. This paper explores an agent-oriented approach to automated software testing, using LLMs to reduce human intervention and enhance testing efficiency.
arXiv Detail & Related papers (2024-12-31T02:06:46Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements [1.4433703131122861]
Large language models (LLMs) have shown potential across various domains, including cybersecurity. There is currently no comprehensive, open, end-to-end automated penetration testing benchmark. This paper introduces a novel open benchmark for LLM-based automated penetration testing.
arXiv Detail & Related papers (2024-10-22T16:18:41Z)
Hacking, The Lazy Way: LLM Augmented Pentesting [0.0]
"LLM Augmented Pentesting" is demonstrated through a tool named "Pentest Copilot" Our research includes a "chain of thought" mechanism to streamline token usage and boost performance. We propose a novel file analysis approach, enabling LLMs to understand files.
arXiv Detail & Related papers (2024-09-14T17:40:35Z)
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups. It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics. With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z)
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses. Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies. We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z)
Test Oracle Automation in the era of LLMs [52.69509240442899]
Large Language Models (LLMs) have demonstrated remarkable proficiency in tackling diverse software testing tasks. This paper aims to enable discussions on the potential of using LLMs for test oracle automation, along with the challenges that may emerge during the generation of various types of oracles.
arXiv Detail & Related papers (2024-05-21T13:19:10Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
Identifying the Risks of LM Agents with an LM-Emulated Sandbox [68.26587052548287]
Language Model (LM) agents and tools enable a rich set of capabilities but also amplify potential risks. High cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks. We introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios.
arXiv Detail & Related papers (2023-09-25T17:08:02Z)
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool [20.449761406790415]
Large Language Models (LLMs) have shown significant advancements in various domains. We evaluate the performance of LLMs on real-world penetration testing tasks using a robust benchmark created from test machines with platforms. We introduce PentestGPT, an LLM-empowered automatic penetration testing tool.
arXiv Detail & Related papers (2023-08-13T14:35:50Z)
Getting pwn'd by AI: Penetration Testing with Large Language Models [0.0]
This paper explores the potential usage of large-language models, such as GPT3.5, to augment penetration testers with AI sparring partners. We explore the feasibility of supplementing penetration testers with AI models for two distinct use cases: high-level task planning for security testing assignments and low-level vulnerability hunting within a vulnerable virtual machine.
arXiv Detail & Related papers (2023-07-24T19:59:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.