CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution
- URL: http://arxiv.org/abs/2505.17107v1
- Date: Wed, 21 May 2025 11:01:11 GMT
- Title: CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution
- Authors: Minghao Shao, Haoran Xi, Nanda Rani, Meet Udeshi, Venkata Sai Charan Putrevu, Kimberly Milner, Brendan Dolan-Gavitt, Sandeep Kumar Shukla, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique,
- Abstract summary: Large Language Model (LLM) agents can automate cybersecurity tasks and can adapt to the evolving cybersecurity landscape without re-engineering.<n>But they have two key limitations: accessing latest cybersecurity expertise beyond training data, and integrating new knowledge into complex task planning.<n>We present CRAKEN, a knowledge-based LLM agent framework that improves cybersecurity capability through three core mechanisms.
- Score: 22.86304661035188
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Model (LLM) agents can automate cybersecurity tasks and can adapt to the evolving cybersecurity landscape without re-engineering. While LLM agents have demonstrated cybersecurity capabilities on Capture-The-Flag (CTF) competitions, they have two key limitations: accessing latest cybersecurity expertise beyond training data, and integrating new knowledge into complex task planning. Knowledge-based approaches that incorporate technical understanding into the task-solving automation can tackle these limitations. We present CRAKEN, a knowledge-based LLM agent framework that improves cybersecurity capability through three core mechanisms: contextual decomposition of task-critical information, iterative self-reflected knowledge retrieval, and knowledge-hint injection that transforms insights into adaptive attack strategies. Comprehensive evaluations with different configurations show CRAKEN's effectiveness in multi-stage vulnerability detection and exploitation compared to previous approaches. Our extensible architecture establishes new methodologies for embedding new security knowledge into LLM-driven cybersecurity agentic systems. With a knowledge database of CTF writeups, CRAKEN obtained an accuracy of 22% on NYU CTF Bench, outperforming prior works by 3% and achieving state-of-the-art results. On evaluation of MITRE ATT&CK techniques, CRAKEN solves 25-30% more techniques than prior work, demonstrating improved cybersecurity capabilities via knowledge-based execution. We make our framework open source to public https://github.com/NYU-LLM-CTF/nyuctf_agents_craken.
Related papers
- Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report [50.268821168513654]
We present Foundation-Sec-8B, a cybersecurity-focused large language model (LLMs) built on the Llama 3.1 architecture.<n>We evaluate it across both established and new cybersecurity benchmarks, showing that it matches Llama 3.1-70B and GPT-4o-mini in certain cybersecurity-specific tasks.<n>By releasing our model to the public, we aim to accelerate progress and adoption of AI-driven tools in both public and private cybersecurity contexts.
arXiv Detail & Related papers (2025-04-28T08:41:12Z) - The Digital Cybersecurity Expert: How Far Have We Come? [49.89857422097055]
We develop CSEBenchmark, a fine-grained cybersecurity evaluation framework based on 345 knowledge points expected of cybersecurity experts.<n>We evaluate 12 popular large language models (LLMs) on CSEBenchmark and find that even the best-performing model achieves only 85.42% overall accuracy.<n>By identifying and addressing specific knowledge gaps in each LLM, we achieve up to an 84% improvement in correcting previously incorrect predictions.
arXiv Detail & Related papers (2025-04-16T05:36:28Z) - Exploring AI-Enabled Cybersecurity Frameworks: Deep-Learning Techniques, GPU Support, and Future Enhancements [0.4419843514606336]
Emerging cybersecurity systems are incorporating AI techniques, specifically deep-learning algorithms, to enhance their ability to detect incidents, analyze alerts, and respond to events.<n>While these techniques offer a promising approach to combating dynamic security threats, they often require significant computational resources.<n>We have identified a total of emphtwo deep-learning algorithms that are utilized by emphthree out of 38 selected cybersecurity frameworks.
arXiv Detail & Related papers (2024-12-17T08:14:12Z) - ChatNVD: Advancing Cybersecurity Vulnerability Assessment with Large Language Models [0.46873264197900916]
ChatNVD is a support tool powered by Large Language Models (LLMs) to generate accessible, context-rich summaries of software vulnerabilities.<n>We develop three variants of ChatNVD, utilizing three prominent LLMs: GPT-4o Mini by OpenAI, LLaMA 3 by Meta, and Gemini 1.5 Pro by Google.<n>Our results demonstrate that GPT-4o Mini outperforms the other models, achieving over 92% accuracy and the lowest error rates.
arXiv Detail & Related papers (2024-12-06T03:45:49Z) - In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models [104.94706600050557]
Text-to-image (T2I) models have shown remarkable progress, but their potential to generate harmful content remains a critical concern in the ML community.<n>We propose ICER, a novel red-teaming framework that generates interpretable and semantic meaningful problematic prompts.<n>Our work provides crucial insights for developing more robust safety mechanisms in T2I systems.
arXiv Detail & Related papers (2024-11-25T04:17:24Z) - CTINexus: Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.<n>Current CTI knowledge extraction methods lack flexibility and generalizability.<n>We propose CTINexus, a novel framework for data-efficient CTI knowledge extraction and high-quality cybersecurity knowledge graph (CSKG) construction.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Hacking, The Lazy Way: LLM Augmented Pentesting [0.0]
We introduce a new concept called "LLM Augmented Pentesting" demonstrated with a tool named "Pentest Copilot"<n>Our approach focuses on overcoming the traditional resistance to automation in penetration testing by employing LLMs to automate specific sub-tasks.<n>Pentest Copilot showcases remarkable proficiency in tasks such as utilizing testing tools, interpreting outputs, and suggesting follow-up actions.
arXiv Detail & Related papers (2024-09-14T17:40:35Z) - CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions [0.2999888908665658]
Large Language Models (LLMs) have significantly advanced natural language processing (NLP) capabilities, providing versatile capabilities across various applications.
However, their application to complex, domain-specific tasks, such as cyber-security, often faces substantial challenges.
In this study, we introduce SecKnowledge and CyberPal.AI to address these challenges and train security-expert LLMs.
arXiv Detail & Related papers (2024-08-17T22:37:39Z) - PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation [18.432274815853116]
PenHeal is a two-stage LLM-based framework designed to autonomously identify and security vulnerabilities.
This paper introduces PenHeal, a two-stage LLM-based framework designed to autonomously identify and security vulnerabilities.
arXiv Detail & Related papers (2024-07-25T05:42:14Z) - Generative AI in Cybersecurity: A Comprehensive Review of LLM Applications and Vulnerabilities [1.0974825157329373]
This paper provides a comprehensive review of the future of cybersecurity through Generative AI and Large Language Models (LLMs)<n>We explore LLM applications across various domains, including hardware design security, intrusion detection, software engineering, design verification, cyber threat intelligence, malware detection, and phishing detection.<n>We present an overview of LLM evolution and its current state, focusing on advancements in models such as GPT-4, GPT-3.5, Mixtral-8x7B, BERT, Falcon2, and LLaMA.
arXiv Detail & Related papers (2024-05-21T13:02:27Z) - Large Language Models for Cyber Security: A Systematic Literature Review [14.924782327303765]
We conduct a comprehensive review of the literature on the application of Large Language Models in cybersecurity (LLM4Security)<n>We observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection.<n>Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training.
arXiv Detail & Related papers (2024-05-08T02:09:17Z) - Towards Automated Classification of Attackers' TTPs by combining NLP
with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research.
Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z) - RoFL: Attestable Robustness for Secure Federated Learning [59.63865074749391]
Federated Learning allows a large number of clients to train a joint model without the need to share their private data.
To ensure the confidentiality of the client updates, Federated Learning systems employ secure aggregation.
We present RoFL, a secure Federated Learning system that improves robustness against malicious clients.
arXiv Detail & Related papers (2021-07-07T15:42:49Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.