STAF: Leveraging LLMs for Automated Attack Tree-Based Security Test Generation
- URL: http://arxiv.org/abs/2509.20190v1
- Date: Wed, 24 Sep 2025 14:46:42 GMT
- Title: STAF: Leveraging LLMs for Automated Attack Tree-Based Security Test Generation
- Authors: Tanmay Khule, Stefan Marksteiner, Jose Alguindigue, Hannes Fuchs, Sebastian Fischmeister, Apurva Narayan,
- Abstract summary: This paper introduces STAF (Security Test Automation Framework), a novel approach to automating security test case generation.<n>We show the elements and processes needed to produce sensible and executable automotive security test suites, along with the integration with an automated testing framework.
- Score: 3.283937504286784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In modern automotive development, security testing is critical for safeguarding systems against increasingly advanced threats. Attack trees are widely used to systematically represent potential attack vectors, but generating comprehensive test cases from these trees remains a labor-intensive, error-prone task that has seen limited automation in the context of testing vehicular systems. This paper introduces STAF (Security Test Automation Framework), a novel approach to automating security test case generation. Leveraging Large Language Models (LLMs) and a four-step self-corrective Retrieval-Augmented Generation (RAG) framework, STAF automates the generation of executable security test cases from attack trees, providing an end-to-end solution that encompasses the entire attack surface. We particularly show the elements and processes needed to provide an LLM to actually produce sensible and executable automotive security test suites, along with the integration with an automated testing framework. We further compare our tailored approach with general purpose (vanilla) LLMs and the performance of different LLMs (namely GPT-4.1 and DeepSeek) using our approach. We also demonstrate the method of our operation step-by-step in a concrete case study. Our results show significant improvements in efficiency, accuracy, scalability, and easy integration in any workflow, marking a substantial advancement in automating automotive security testing methodologies. Using TARAs as an input for verfication tests, we create synergies by connecting two vital elements of a secure automotive development process.
Related papers
- RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories [58.32028251925354]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area.<n>We introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories.
arXiv Detail & Related papers (2026-01-30T08:29:01Z) - Automated Safety Benchmarking: A Multi-agent Pipeline for LVLMs [61.01470415470677]
Large vision-language models (LVLMs) exhibit remarkable capabilities in cross-modal tasks but face significant safety challenges.<n>Existing benchmarks are hindered by their labor-intensive construction process, static complexity, and limited discriminative power.<n>We propose VLSafetyBencher, the first automated system for LVLM safety benchmarking.
arXiv Detail & Related papers (2026-01-27T11:51:30Z) - PentestEval: Benchmarking LLM-based Penetration Testing with Modular and Stage-Level Design [30.68819474524929]
PentestEval is the first comprehensive benchmark for evaluating Large Language Models (LLMs) across six penetration testing stages.<n>It integrates expert-annotated ground truth with a fully automated evaluation pipeline across 346 tasks covering all stages in 12 realistic vulnerable scenarios.<n>Our stage-level evaluation of 9 widely used LLMs reveals generally weak performance and distinct limitations across the stages of penetration-testing workflow.
arXiv Detail & Related papers (2025-12-16T09:37:21Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator [77.86600052899156]
Large Language Model (LLM)-based agents are increasingly deployed in real-world applications.<n>We propose AutoSafe, the first framework that systematically enhances agent safety through fully automated synthetic data generation.<n>We show that AutoSafe boosts safety scores by 45% on average and achieves a 28.91% improvement on real-world tasks.
arXiv Detail & Related papers (2025-05-23T10:56:06Z) - ThreatLens: LLM-guided Threat Modeling and Test Plan Generation for Hardware Security Verification [1.0816123715383426]
ThreatLens is a multi-agent framework that automates security threat modeling and test plan generation for hardware security verification.<n>The framework reduces the manual verification effort, enhances coverage, and ensures a structured, adaptable approach to security verification.
arXiv Detail & Related papers (2025-05-11T03:10:39Z) - PentestAgent: Incorporating LLM Agents to Automated Penetration Testing [6.815381197173165]
Manual penetration testing is time-consuming and expensive.<n>Recent advancements in large language models (LLMs) offer new opportunities for enhancing penetration testing.<n>We propose PentestAgent, a novel LLM-based automated penetration testing framework.
arXiv Detail & Related papers (2024-11-07T21:10:39Z) - AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - Hacking, The Lazy Way: LLM Augmented Pentesting [0.0]
We introduce a new concept called "LLM Augmented Pentesting" demonstrated with a tool named "Pentest Copilot"<n>Our approach focuses on overcoming the traditional resistance to automation in penetration testing by employing LLMs to automate specific sub-tasks.<n>Pentest Copilot showcases remarkable proficiency in tasks such as utilizing testing tools, interpreting outputs, and suggesting follow-up actions.
arXiv Detail & Related papers (2024-09-14T17:40:35Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.