PenTest2.0: Towards Autonomous Privilege Escalation Using GenAI
- URL: http://arxiv.org/abs/2507.06742v1
- Date: Wed, 09 Jul 2025 10:56:32 GMT
- Title: PenTest2.0: Towards Autonomous Privilege Escalation Using GenAI
- Authors: Haitham S. Al-Sinani, Chris J. Mitchell,
- Abstract summary: PenTest++ is an AI-augmented system combining automation with generative AI supporting ethical hacking.<n>PenTest2.0 supports automated privilege escalation driven entirely by Large Language Model reasoning.<n>We describe how it operates, present a proof-of-concept prototype, and discuss its benefits and limitations.
- Score: 2.3020018305241337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ethical hacking today relies on highly skilled practitioners executing complex sequences of commands, which is inherently time-consuming, difficult to scale, and prone to human error. To help mitigate these limitations, we previously introduced 'PenTest++', an AI-augmented system combining automation with generative AI supporting ethical hacking workflows. However, a key limitation of PenTest++ was its lack of support for privilege escalation, a crucial element of ethical hacking. In this paper we present 'PenTest2.0', a substantial evolution of PenTest++ supporting automated privilege escalation driven entirely by Large Language Model reasoning. It also incorporates several significant enhancements: 'Retrieval-Augmented Generation', including both one-line and offline modes; 'Chain-of-Thought' prompting for intermediate reasoning; persistent 'PenTest Task Trees' to track goal progression across turns; and the optional integration of human-authored hints. We describe how it operates, present a proof-of-concept prototype, and discuss its benefits and limitations. We also describe application of the system to a controlled Linux target, showing it can carry out multi-turn, adaptive privilege escalation. We explain the rationale behind its core design choices, and provide comprehensive testing results and cost analysis. Our findings indicate that 'PenTest2.0' represents a meaningful step toward practical, scalable, AI-automated penetration testing, whilst highlighting the shortcomings of generative AI systems, particularly their sensitivity to prompt structure, execution context, and semantic drift, reinforcing the need for further research and refinement in this emerging space. Keywords: AI, Ethical Hacking, Privilege Escalation, GenAI, ChatGPT, LLM (Large Language Model), HITL (Human-in-the-Loop)
Related papers
- SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam? [51.112225746095746]
We introduce X-Master, a tool-augmented reasoning agent designed to emulate human researchers.<n>X-Masters sets a new state-of-the-art record on Humanity's Last Exam with a score of 32.1%.
arXiv Detail & Related papers (2025-07-07T17:50:52Z) - Generative AI Act II: Test Time Scaling Drives Cognition Engineering [28.818378991228563]
"Act II" (2024-present) is where models are transitioning from knowledge-retrieval systems to thought-construction engines through test-time scaling techniques.<n>This new paradigm establishes a mind-level connection with AI through language-based thoughts.<n>We systematically break down these advanced approaches through comprehensive tutorials and optimized implementations.
arXiv Detail & Related papers (2025-04-18T17:55:58Z) - General Scales Unlock AI Evaluation with Explanatory and Predictive Power [57.7995945974989]
benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems.<n>We introduce general scales for AI evaluation that can explain what common AI benchmarks really measure.<n>Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate.
arXiv Detail & Related papers (2025-03-09T01:13:56Z) - PenTest++: Elevating Ethical Hacking with AI and Automation [2.3020018305241337]
PenTest++ is an AI-augmented system that integrates automation with generative AI (GenAI) to optimise ethical hacking.<n>It balances automation with human oversight, ensuring informed decision-making at key stages.
arXiv Detail & Related papers (2025-02-13T16:46:23Z) - Hacking, The Lazy Way: LLM Augmented Pentesting [0.0]
We introduce a new concept called "LLM Augmented Pentesting" demonstrated with a tool named "Pentest Copilot"<n>Our approach focuses on overcoming the traditional resistance to automation in penetration testing by employing LLMs to automate specific sub-tasks.<n>Pentest Copilot showcases remarkable proficiency in tasks such as utilizing testing tools, interpreting outputs, and suggesting follow-up actions.
arXiv Detail & Related papers (2024-09-14T17:40:35Z) - Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models [54.58108387797138]
We investigate the effectiveness of prompt learning in code intelligence tasks.
Existing automatic prompt design methods are very limited to code intelligence tasks.
We propose Genetic Auto Prompt (GenAP) which utilizes an elaborate genetic algorithm to automatically design prompts.
arXiv Detail & Related papers (2024-03-20T13:37:00Z) - Generative Input: Towards Next-Generation Input Methods Paradigm [49.98958865125018]
We propose a novel Generative Input paradigm named GeneInput.
It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results.
The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task.
arXiv Detail & Related papers (2023-11-02T12:01:29Z) - Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI.
While this generative AI approach has produced impressive results, it heavily leans on human supervision.
This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation.
We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z) - What Do End-Users Really Want? Investigation of Human-Centered XAI for
Mobile Health Apps [69.53730499849023]
We present a user-centered persona concept to evaluate explainable AI (XAI)
Results show that users' demographics and personality, as well as the type of explanation, impact explanation preferences.
Our insights bring an interactive, human-centered XAI closer to practical application.
arXiv Detail & Related papers (2022-10-07T12:51:27Z) - Towards Involving End-users in Interactive Human-in-the-loop AI Fairness [1.889930012459365]
Ensuring fairness in artificial intelligence (AI) is important to counteract bias and discrimination in far-reaching applications.
Recent work has started to investigate how humans judge fairness and how to support machine learning (ML) experts in making their AI models fairer.
Our work explores designing interpretable and interactive human-in-the-loop interfaces that allow ordinary end-users to identify potential fairness issues.
arXiv Detail & Related papers (2022-04-22T02:24:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.