Related papers: Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements

Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements

URL: http://arxiv.org/abs/2410.17141v2
Date: Fri, 25 Oct 2024 16:58:37 GMT
Title: Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements
Authors: Isamu Isozaki, Manil Shrestha, Rick Console, Edward Kim,
Abstract summary: Large language models (LLMs) have shown potential across various domains, including cybersecurity. There is currently no comprehensive, open, end-to-end automated penetration testing benchmark. This paper introduces a novel open benchmark for LLM-based automated penetration testing.
Score: 1.4433703131122861
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hacking poses a significant threat to cybersecurity, inflicting billions of dollars in damages annually. To mitigate these risks, ethical hacking, or penetration testing, is employed to identify vulnerabilities in systems and networks. Recent advancements in large language models (LLMs) have shown potential across various domains, including cybersecurity. However, there is currently no comprehensive, open, end-to-end automated penetration testing benchmark to drive progress and evaluate the capabilities of these models in security contexts. This paper introduces a novel open benchmark for LLM-based automated penetration testing, addressing this critical gap. We first evaluate the performance of LLMs, including GPT-4o and Llama 3.1-405B, using the state-of-the-art PentestGPT tool. Our findings reveal that while Llama 3.1 demonstrates an edge over GPT-4o, both models currently fall short of performing fully automated, end-to-end penetration testing. Next, we advance the state-of-the-art and present ablation studies that provide insights into improving the PentestGPT tool. Our research illuminates the challenges LLMs face in each aspect of Pentesting, e.g. enumeration, exploitation, and privilege escalation. This work contributes to the growing body of knowledge on AI-assisted cybersecurity and lays the foundation for future research in automated penetration testing using large language models.

Related papers

CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z)
Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
We develop an adversarial reasoning approach to automatic jailbreaking via test-time computation. Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
arXiv Detail & Related papers (2025-02-03T18:59:01Z)
Global Challenge for Safe and Secure LLMs Track 1 [57.08717321907755]
The Global Challenge for Safe and Secure Large Language Models (LLMs) is a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks.
arXiv Detail & Related papers (2024-11-21T08:20:31Z)
PentestAgent: Incorporating LLM Agents to Automated Penetration Testing [6.815381197173165]
Manual penetration testing is time-consuming and expensive. Recent advancements in large language models (LLMs) offer new opportunities for enhancing penetration testing. We propose PentestAgent, a novel LLM-based automated penetration testing framework.
arXiv Detail & Related papers (2024-11-07T21:10:39Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities [63.603861880022954]
We introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability. Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100% ASR on various open-source LLMs. It exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3.
arXiv Detail & Related papers (2024-10-24T06:36:12Z)
Hacking, The Lazy Way: LLM Augmented Pentesting [0.0]
"LLM Augmented Pentesting" is demonstrated through a tool named "Pentest Copilot" Our research includes a "chain of thought" mechanism to streamline token usage and boost performance. We propose a novel file analysis approach, enabling LLMs to understand files.
arXiv Detail & Related papers (2024-09-14T17:40:35Z)
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher [1.6652242654250329]
We develop CIPHER (Cybersecurity Intelligent Penetration-testing Helper for Ethical Researchers), a large language model specifically trained to assist in penetration testing tasks. We trained CIPHER using over 300 high-quality write-ups of vulnerable machines, hacking techniques, and documentation of open-source penetration testing tools. We introduce the Findings, Action, Reasoning, and Results (FARR) Flow augmentation, a novel method to augment penetration testing write-ups to establish a fully automated pentesting simulation benchmark.
arXiv Detail & Related papers (2024-08-21T14:24:04Z)
Automated Text Scoring in the Age of Generative AI for the GPU-poor [49.1574468325115]
We analyze the performance and efficiency of open-source, small-scale generative language models for automated text scoring. Results show that GLMs can be fine-tuned to achieve adequate, though not state-of-the-art, performance.
arXiv Detail & Related papers (2024-07-02T01:17:01Z)
Selene: Pioneering Automated Proof in Software Verification [62.09555413263788]
We introduce Selene, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4. Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation.
arXiv Detail & Related papers (2024-01-15T13:08:38Z)
Vulnerability of Machine Learning Approaches Applied in IoT-based Smart Grid: A Review [51.31851488650698]
Machine learning (ML) sees an increasing prevalence of being used in the internet-of-things (IoT)-based smart grid. adversarial distortion injected into the power signal will greatly affect the system's normal control and operation. It is imperative to conduct vulnerability assessment for MLsgAPPs applied in the context of safety-critical power systems.
arXiv Detail & Related papers (2023-08-30T03:29:26Z)
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool [20.449761406790415]
Large Language Models (LLMs) have shown significant advancements in various domains. We evaluate the performance of LLMs on real-world penetration testing tasks using a robust benchmark created from test machines with platforms. We introduce PentestGPT, an LLM-empowered automatic penetration testing tool.
arXiv Detail & Related papers (2023-08-13T14:35:50Z)
Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety [54.478842696269304]
The use of deep neural networks (DNNs) in safety-critical applications is challenging due to numerous model-inherent shortcomings. In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged. Our paper addresses both machine learning experts and safety engineers.
arXiv Detail & Related papers (2021-04-29T09:54:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.