Cybersecurity AI Benchmark (CAIBench): A Meta-Benchmark for Evaluating Cybersecurity AI Agents
- URL: http://arxiv.org/abs/2510.24317v1
- Date: Tue, 28 Oct 2025 11:36:20 GMT
- Title: Cybersecurity AI Benchmark (CAIBench): A Meta-Benchmark for Evaluating Cybersecurity AI Agents
- Authors: María Sanz-Gómez, Víctor Mayoral-Vilches, Francesco Balassone, Luis Javier Navarrete-Lozano, Cristóbal R. J. Veas Chavez, Maite del Mundo de Torres,
- Abstract summary: Existing benchmarks assess isolated skills rather than integrated performance.<n>We present the Cybersecurity AI Benchmark (CAIBench), a modular meta-benchmark framework.<n>We find that proper matches improve up to 2.6$times$ variance in Attack and Defense CTFs.
- Score: 0.36134114973155557
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Cybersecurity spans multiple interconnected domains, complicating the development of meaningful, labor-relevant benchmarks. Existing benchmarks assess isolated skills rather than integrated performance. We find that pre-trained knowledge of cybersecurity in LLMs does not imply attack and defense abilities, revealing a gap between knowledge and capability. To address this limitation, we present the Cybersecurity AI Benchmark (CAIBench), a modular meta-benchmark framework that allows evaluating LLM models and agents across offensive and defensive cybersecurity domains, taking a step towards meaningfully measuring their labor-relevance. CAIBench integrates five evaluation categories, covering over 10,000 instances: Jeopardy-style CTFs, Attack and Defense CTFs, Cyber Range exercises, knowledge benchmarks, and privacy assessments. Key novel contributions include systematic simultaneous offensive-defensive evaluation, robotics-focused cybersecurity challenges (RCTF2), and privacy-preserving performance assessment (CyberPII-Bench). Evaluation of state-of-the-art AI models reveals saturation on security knowledge metrics (~70\% success) but substantial degradation in multi-step adversarial (A\&D) scenarios (20-40\% success), or worse in robotic targets (22\% success). The combination of framework scaffolding and LLM model choice significantly impacts performance; we find that proper matches improve up to 2.6$\times$ variance in Attack and Defense CTFs. These results demonstrate a pronounced gap between conceptual knowledge and adaptive capability, emphasizing the need for a meta-benchmark.
Related papers
- Can AI Lower the Barrier to Cybersecurity? A Human-Centered Mixed-Methods Study of Novice CTF Learning [0.0]
Agentic AI frameworks for cybersecurity promise to lower barriers by automating and coordinating penetration testing tasks.<n>We present a human-centered, mixed-methods case study examining how agentic AI frameworks mediates novice entry into CTF-based penetration testing.
arXiv Detail & Related papers (2026-02-20T12:20:36Z) - Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misuse [50.87630846876635]
We develop nine detailed cyber risk models.<n>Each model decomposes attacks into steps using the MITRE ATT&CK framework.<n>Individual estimates are aggregated through Monte Carlo simulation.
arXiv Detail & Related papers (2025-12-09T17:54:17Z) - CyberSOCEval: Benchmarking LLMs Capabilities for Malware Analysis and Threat Intelligence Reasoning [1.3863707631653515]
Cyber defenders are overwhelmed by a deluge of security alerts, threat intelligence signals, and shifting business context.<n>Existing evaluations do not fully assess the scenarios most relevant to real-world defenders.<n>We introduce CyberSOCEval, a new suite of open source benchmarks within CyberSecEval 4.
arXiv Detail & Related papers (2025-09-24T14:33:07Z) - A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives [65.3369988566853]
Recent studies have demonstrated that adversaries can replicate a target model's functionality.<n>Model Extraction Attacks pose threats to intellectual property, privacy, and system security.<n>We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments.
arXiv Detail & Related papers (2025-08-20T19:49:59Z) - The Application of Transformer-Based Models for Predicting Consequences of Cyber Attacks [0.4604003661048266]
Threat Modeling can provide critical support to cybersecurity professionals, enabling them to take timely action and allocate resources that could be used elsewhere.<n>Recently, there has been a pressing need for automated methods to assess attack descriptions and forecast the future consequences of cyberattacks.<n>This study examines how Natural Language Processing (NLP) and deep learning can be applied to analyze the potential impact of cyberattacks.
arXiv Detail & Related papers (2025-08-18T15:46:36Z) - CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale [45.97598662617568]
We introduce CyberGym, a large-scale benchmark featuring 1,507 real-world vulnerabilities across 188 software projects.<n>We show that CyberGym leads to the discovery of 35 zero-day vulnerabilities and 17 historically incomplete patches.<n>These results underscore that CyberGym is not only a robust benchmark for measuring AI's progress in cybersecurity but also a platform for creating direct, real-world security impact.
arXiv Detail & Related papers (2025-06-03T07:35:14Z) - The Digital Cybersecurity Expert: How Far Have We Come? [49.89857422097055]
We develop CSEBenchmark, a fine-grained cybersecurity evaluation framework based on 345 knowledge points expected of cybersecurity experts.<n>We evaluate 12 popular large language models (LLMs) on CSEBenchmark and find that even the best-performing model achieves only 85.42% overall accuracy.<n>By identifying and addressing specific knowledge gaps in each LLM, we achieve up to an 84% improvement in correcting previously incorrect predictions.
arXiv Detail & Related papers (2025-04-16T05:36:28Z) - AttackSeqBench: Benchmarking Large Language Models in Analyzing Attack Sequences within Cyber Threat Intelligence [17.234214109636113]
Cyber Threat Intelligence (CTI) reports document observations of cyber threats, synthesizing evidence about adversaries' actions and intent into actionable knowledge.<n>The unstructured and verbose nature of CTI reports poses significant challenges for security practitioners to manually extract and analyze such sequences.<n>Although large language models (LLMs) exhibit promise in cybersecurity tasks such as entity extraction and knowledge graph construction, their understanding and reasoning capabilities towards behavioral sequences remains underexplored.
arXiv Detail & Related papers (2025-03-05T04:25:21Z) - OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities [0.0]
We demonstrate a new approach to assessing AI's progress towards enabling and scaling real-world offensive cyber operations.<n>We detail OCCULT, a lightweight operational evaluation framework that allows cyber security experts to contribute to rigorous and repeatable measurement.<n>We find that there has been significant recent advancement in the risks of AI being used to scale realistic cyber threats.
arXiv Detail & Related papers (2025-02-18T19:33:14Z) - SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI [58.29510889419971]
Existing benchmarks for evaluating the security risks and capabilities of code-generating large language models (LLMs) face several key limitations.<n>We introduce a general and scalable benchmark construction framework that begins with manually validated, high-quality seed examples and expands them via targeted mutations.<n>Applying this framework to Python, C/C++, and Java, we build SeCodePLT, a dataset of more than 5.9k samples spanning 44 CWE-based risk categories and three security capabilities.
arXiv Detail & Related papers (2024-10-14T21:17:22Z) - MF-CLIP: Leveraging CLIP as Surrogate Models for No-box Adversarial Attacks [65.86360607693457]
No-box attacks, where adversaries have no prior knowledge, remain relatively underexplored despite its practical relevance.<n>This work presents a systematic investigation into leveraging large-scale Vision-Language Models (VLMs) as surrogate models for executing no-box attacks.<n>Our theoretical and empirical analyses reveal a key limitation in the execution of no-box attacks stemming from insufficient discriminative capabilities for direct application of vanilla CLIP as a surrogate model.<n>We propose MF-CLIP: a novel framework that enhances CLIP's effectiveness as a surrogate model through margin-aware feature space optimization.
arXiv Detail & Related papers (2023-07-13T08:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.