The Role of AI in Modern Penetration Testing
- URL: http://arxiv.org/abs/2512.12326v1
- Date: Sat, 13 Dec 2025 13:34:31 GMT
- Title: The Role of AI in Modern Penetration Testing
- Authors: J. Alexander Curtis, Nasir U. Eisty,
- Abstract summary: Penetration testing is a cornerstone of cybersecurity, traditionally driven by manual, time-intensive processes.<n>This systematic literature review examines how Artificial Intelligence (AI) is reshaping penetration testing.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Penetration testing is a cornerstone of cybersecurity, traditionally driven by manual, time-intensive processes. As systems grow in complexity, there is a pressing need for more scalable and efficient testing methodologies. This systematic literature review examines how Artificial Intelligence (AI) is reshaping penetration testing, analyzing 58 peer-reviewed studies from major academic databases. Our findings reveal that while AI-assisted pentesting is still in its early stages, notable progress is underway, particularly through Reinforcement Learning (RL), which was the focus of 77% of the reviewed works. Most research centers on the discovery and exploitation phases of pentesting, where AI shows the greatest promise in automating repetitive tasks, optimizing attack strategies, and improving vulnerability identification. Real-world applications remain limited but encouraging, including the European Space Agency's PenBox and various open-source tools. These demonstrate AI's potential to streamline attack path analysis, analyze complex network topology, and reduce manual workload. However, challenges persist: current models often lack flexibility and are underdeveloped for the reconnaissance and post-exploitation phases of pentesting. Applications involving Large Language Models (LLMs) remain relatively under-researched, pointing to a promising direction for future exploration. This paper offers a critical overview of AI's current and potential role in penetration testing, providing valuable insights for researchers, practitioners, and organizations aiming to enhance security assessments through advanced automation or looking for gaps in existing research.
Related papers
- Let the Barbarians In: How AI Can Accelerate Systems Performance Research [80.43506848683633]
We term this iterative cycle of generation, evaluation, and refinement AI-Driven Research for Systems.<n>We demonstrate that ADRS-generated solutions can match or even outperform human state-of-the-art designs.
arXiv Detail & Related papers (2025-12-16T18:51:23Z) - SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z) - FML-bench: A Benchmark for Automatic ML Research Agents Highlighting the Importance of Exploration Breadth [43.606494515048524]
Large language models (LLMs) have sparked growing interest in automatic machine learning research agents.<n>Existing benchmarks tend to overemphasize engineering aspects while neglecting academic rigor.<n>We introduce FML-bench, a benchmark designed to evaluate automatic machine learning research agents on 8 diverse and fundamental machine learning research problems.
arXiv Detail & Related papers (2025-10-12T06:41:05Z) - Barbarians at the Gate: How AI is Upending Systems Research [58.95406995634148]
We argue that systems research, long focused on designing and evaluating new performance-oriented algorithms, is particularly well-suited for AI-driven solution discovery.<n>We term this approach as AI-Driven Research for Systems ( ADRS), which iteratively generates, evaluates, and refines solutions.<n>Our results highlight both the disruptive potential and the urgent need to adapt systems research practices in the age of AI.
arXiv Detail & Related papers (2025-10-07T17:49:24Z) - SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents [93.26456498576181]
This paper focuses on the development of native Autonomous Single-Agent models for Deep Research.<n>Our best variant SFR-DR-20B achieves up to 28.7% on Humanity's Last Exam benchmark.
arXiv Detail & Related papers (2025-09-08T02:07:09Z) - AI4Research: A Survey of Artificial Intelligence for Scientific Research [55.5452803680643]
We present a comprehensive survey on AI for Research (AI4Research)<n>We first introduce a systematic taxonomy to classify five mainstream tasks in AI4Research.<n>We identify key research gaps and highlight promising future directions.
arXiv Detail & Related papers (2025-07-02T17:19:20Z) - ML Research Benchmark [0.0]
We present the ML Research Benchmark (MLRB), comprising 7 competition-level tasks derived from recent machine learning conference tracks.
This paper introduces a novel benchmark and evaluates it using agent scaffolds powered by frontier models, including Claude-3 and GPT-4o.
The results indicate that the Claude-3.5 Sonnet agent performs best across our benchmark, excelling in planning and developing machine learning models.
arXiv Detail & Related papers (2024-10-29T21:38:42Z) - A Comprehensive Survey of Advanced Persistent Threat Attribution: Taxonomy, Methods, Challenges and Open Research Problems [3.410195565199523]
Advanced Persistent Threat attribution is a critical challenge in cybersecurity.
With the growing prominence of artificial intelligence (AI) and machine learning (ML) techniques, researchers are increasingly focused on developing automated solutions to link cyber threats to responsible actors.
Previous literature on automated threat attribution lacks a systematic review of automated methods and relevant artifacts that can aid in the attribution process.
arXiv Detail & Related papers (2024-09-07T12:42:43Z) - CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher [1.6652242654250329]
We develop CIPHER (Cybersecurity Intelligent Penetration-testing Helper for Ethical Researchers), a large language model specifically trained to assist in penetration testing tasks.
We trained CIPHER using over 300 high-quality write-ups of vulnerable machines, hacking techniques, and documentation of open-source penetration testing tools.
We introduce the Findings, Action, Reasoning, and Results (FARR) Flow augmentation, a novel method to augment penetration testing write-ups to establish a fully automated pentesting simulation benchmark.
arXiv Detail & Related papers (2024-08-21T14:24:04Z) - AI for IT Operations (AIOps) on Cloud Platforms: Reviews, Opportunities
and Challenges [60.56413461109281]
Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big data generated by IT Operations processes.
We discuss in depth the key types of data emitted by IT Operations activities, the scale and challenges in analyzing them, and where they can be helpful.
We categorize the key AIOps tasks as - incident detection, failure prediction, root cause analysis and automated actions.
arXiv Detail & Related papers (2023-04-10T15:38:12Z) - An Exploratory Study of AI System Risk Assessment from the Lens of Data
Distribution and Uncertainty [4.99372598361924]
Deep learning (DL) has become a driving force and has been widely adopted in many domains and applications.
This paper initiates an early exploratory study of AI system risk assessment from both the data distribution and uncertainty angles.
arXiv Detail & Related papers (2022-12-13T03:34:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.