Related papers: AWE: Adaptive Agents for Dynamic Web Penetration Testing

AWE: Adaptive Agents for Dynamic Web Penetration Testing

URL: http://arxiv.org/abs/2603.00960v1
Date: Sun, 01 Mar 2026 07:32:42 GMT
Title: AWE: Adaptive Agents for Dynamic Web Penetration Testing
Authors: Akshat Singh Jaswal, Ashish Baghel,
Abstract summary: AWE is a memory-augmented multi-agent framework for autonomous web penetration testing.<n>It embeds structured, vulnerability-specific analysis pipelines within a lightweight LLM orchestration layer.<n>AWE achieves substantial gains on injection-class vulnerabilities.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern web applications are increasingly produced through AI-assisted development and rapid no-code deployment pipelines, widening the gap between accelerating software velocity and the limited adaptability of existing security tooling. Pattern-driven scanners fail to reason about novel contexts, while emerging LLM-based penetration testers rely on unconstrained exploration, yielding high cost, unstable behavior, and poor reproducibility. We introduce AWE, a memory-augmented multi-agent framework for autonomous web penetration testing that embeds structured, vulnerability-specific analysis pipelines within a lightweight LLM orchestration layer. Unlike general-purpose agents, AWE couples context aware payload mutations and generations with persistent memory and browser-backed verification to produce deterministic, exploitation-driven results. Evaluated on the 104-challenge XBOW benchmark, AWE achieves substantial gains on injection-class vulnerabilities - 87% XSS success (+30.5% over MAPTA) and 66.7% blind SQL injection success (+33.3%) - while being much faster, cheaper, and more token-efficient than MAPTA, despite using a midtier model (Claude Sonnet 4) versus MAPTA's GPT-5. MAPTA retains higher overall coverage due to broader exploratory capabilities, underscoring the complementary strengths of specialized and general-purpose architectures. Our results demonstrate that architecture matters as much as model reasoning capabilities: integrating LLMs into principled, vulnerability-aware pipelines yields substantial gains in accuracy, efficiency, and determinism for injection-class exploits. The source code for AWE is available at: https://github.com/stuxlabs/AWE

Related papers

Sifting the Noise: A Comparative Study of LLM Agents in Vulnerability False Positive Filtering [2.5335007441696384]
Static Application Security Testing (SAST) tools are essential for identifying software vulnerabilities.<n>SAST tools often produce a high volume of false positives (FPs)<n>Recent advances in Large Language Model (LLM) agents offer a promising direction.
arXiv Detail & Related papers (2026-01-30T13:14:55Z)
RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories [58.32028251925354]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area.<n>We introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories.
arXiv Detail & Related papers (2026-01-30T08:29:01Z)
ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack [52.17935054046577]
We present ReasAlign, a model-level solution to improve safety alignment against indirect prompt injection attacks.<n>ReasAlign incorporates structured reasoning steps to analyze user queries, detect conflicting instructions, and preserve the continuity of the user's intended tasks.
arXiv Detail & Related papers (2026-01-15T08:23:38Z)
What Limits Agentic Systems Efficiency? [6.355808944609144]
Existing research predominantly focuses on reasoning performance, often neglecting the efficiency of agentic systems.<n>We decompose end-to-end latency into two primary components: API latency and web environment latency.<n>We propose SpecCache, a caching framework augmented with speculative execution that can reduce web environment overhead.
arXiv Detail & Related papers (2025-10-18T00:21:45Z)
AI Agentic Vulnerability Injection And Transformation with Optimized Reasoning [2.918225266151982]
We present AVIATOR, the first AI-agentic vulnerability injection workflow.<n>It automatically injects realistic, category-specific vulnerabilities for high-fidelity, diverse, large-scale vulnerability dataset generation.<n>It combines semantic analysis, injection synthesis enhanced with LoRA-based fine-tuning and Retrieval-Augmented Generation, as well as post-injection validation via static analysis and LLM-based discriminators.
arXiv Detail & Related papers (2025-08-28T14:59:39Z)
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z)
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z)
EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture.<n>It is on par with existing deep learning techniques but has better explainability.<n>It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z)
Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs [6.517076600304129]
Large Language Models (LLMs) are increasingly used in software development to generate functions, such as attack detectors, that implement security requirements.<n>We propose an approach integrating Retrieval Augmented Generation (RAG) and Self-Ranking into the LLM pipeline.<n>RAG enhances the robustness of the output by incorporating external knowledge sources, while the Self-Ranking technique, inspired by the concept of Self-Consistency, generates multiple reasoning paths and creates ranks to select the most robust detector.
arXiv Detail & Related papers (2024-11-27T10:48:37Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities [50.980446687774645]
We introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability.<n>Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100% ASR on various open-source LLMs.<n>It exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3.
arXiv Detail & Related papers (2024-10-24T06:36:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.