Cybersecurity AI: Evaluating Agentic Cybersecurity in Attack/Defense CTFs
- URL: http://arxiv.org/abs/2510.17521v1
- Date: Mon, 20 Oct 2025 13:21:09 GMT
- Title: Cybersecurity AI: Evaluating Agentic Cybersecurity in Attack/Defense CTFs
- Authors: Francesco Balassone, VĂctor Mayoral-Vilches, Stefan Rass, Martin Pinzger, Gaetano Perrone, Simon Pietro Romano, Peter Schartner,
- Abstract summary: We evaluate whether AI systems are more effective at attacking or defending in cybersecurity.<n> Statistical analysis reveals defensive agents achieve 54.3% unconstrained patching success.<n>Findings underscore the urgency for defenders to adopt open-source Cybersecurity AI frameworks.
- Score: 3.6968315805917897
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We empirically evaluate whether AI systems are more effective at attacking or defending in cybersecurity. Using CAI (Cybersecurity AI)'s parallel execution framework, we deployed autonomous agents in 23 Attack/Defense CTF battlegrounds. Statistical analysis reveals defensive agents achieve 54.3% unconstrained patching success versus 28.3% offensive initial access (p=0.0193), but this advantage disappears under operational constraints: when defense requires maintaining availability (23.9%) and preventing all intrusions (15.2%), no significant difference exists (p>0.05). Exploratory taxonomy analysis suggests potential patterns in vulnerability exploitation, though limited sample sizes preclude definitive conclusions. This study provides the first controlled empirical evidence challenging claims of AI attacker advantage, demonstrating that defensive effectiveness critically depends on success criteria, a nuance absent from conceptual analyses but essential for deployment. These findings underscore the urgency for defenders to adopt open-source Cybersecurity AI frameworks to maintain security equilibrium against accelerating offensive automation.
Related papers
- To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack [14.333336222782856]
AI agents automate vulnerability discovery and exploitation across thousands of targets.<n>Current developers focus on preventing misuse through data filtering, safety alignment, and output guardrails.<n>We argue that AI-agent-driven cyber attacks are inevitable, requiring a fundamental shift in defensive strategy.
arXiv Detail & Related papers (2026-02-01T12:37:55Z) - AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications [71.27518152526686]
Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation.<n>LLMs can be manipulated by "adversarial instructions" hidden in input data, such as resumes or code, causing them to deviate from their intended task.<n>This paper introduces a benchmark to assess this vulnerability in resume screening, revealing attack success rates exceeding 80% for certain attack types.
arXiv Detail & Related papers (2025-12-23T08:42:09Z) - Categorical Framework for Quantum-Resistant Zero-Trust AI Security [0.0]
We present a novel integration of post-quantum cryptography (PQC) and zero trust architecture (AZT) to secure AI model.<n>Our framework uniquely models cryptographic access as morphisms and trust policies as functors.<n>We demonstrate its efficacy through a concrete ESP32-based implementation.
arXiv Detail & Related papers (2025-11-25T17:17:24Z) - Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models [0.0]
Large language models (LLMs) remain vulnerable to sophisticated prompt engineering attacks.<n>We introduce Jailbreak Mimicry, a systematic methodology for training compact attacker models to automatically generate narrative-based jailbreak prompts.<n>Our approach transforms adversarial prompt discovery from manual craftsmanship into a reproducible scientific process.
arXiv Detail & Related papers (2025-10-24T23:53:16Z) - The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections [74.60337113759313]
Current defenses against jailbreaks and prompt injections are typically evaluated against a static set of harmful attack strings.<n>We argue that this evaluation process is flawed. Instead, we should evaluate defenses against adaptive attackers who explicitly modify their attack strategy to counter a defense's design.
arXiv Detail & Related papers (2025-10-10T05:51:04Z) - CIA+TA Risk Assessment for AI Reasoning Vulnerabilities [0.0]
We present a framework for cognitive cybersecurity, a systematic protection of AI reasoning processes from adversarial manipulation.<n>First, we establish cognitive cybersecurity as a discipline complementing traditional cybersecurity and AI safety.<n>Second, we introduce the CIA+TA, extending traditional Confidentiality, Integrity, and Availability with Trust.<n>Third, we present a quantitative risk assessment methodology with empirically-derived coefficients, enabling organizations to measure cognitive security risks.
arXiv Detail & Related papers (2025-08-19T13:56:09Z) - Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z) - CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale [45.97598662617568]
We introduce CyberGym, a large-scale benchmark featuring 1,507 real-world vulnerabilities across 188 software projects.<n>We show that CyberGym leads to the discovery of 35 zero-day vulnerabilities and 17 historically incomplete patches.<n>These results underscore that CyberGym is not only a robust benchmark for measuring AI's progress in cybersecurity but also a platform for creating direct, real-world security impact.
arXiv Detail & Related papers (2025-06-03T07:35:14Z) - Frontier AI's Impact on the Cybersecurity Landscape [46.32458228179959]
We find that while AI is already widely used in attacks, its application in defense remains limited.<n>Experts expect AI to continue favoring attackers over defenders, though the gap will gradually narrow.
arXiv Detail & Related papers (2025-04-07T18:25:18Z) - Real AI Agents with Fake Memories: Fatal Context Manipulation Attacks on Web3 Agents [36.49717045080722]
This paper investigates the vulnerabilities of AI agents within blockchain-based financial ecosystems when exposed to adversarial threats in real-world scenarios.<n>We introduce the concept of context manipulation -- a comprehensive attack vector that exploits unprotected context surfaces.<n>Using ElizaOS, we showcase that malicious injections into prompts or historical records can trigger unauthorized asset transfers and protocol violations.
arXiv Detail & Related papers (2025-03-20T15:44:31Z) - A Framework for Evaluating Emerging Cyberattack Capabilities of AI [11.595840449117052]
This work introduces a novel evaluation framework that addresses limitations by: (1) examining the end-to-end attack chain, (2) identifying gaps in AI threat evaluation, and (3) helping defenders prioritize targeted mitigations.<n>We analyzed over 12,000 real-world instances of AI involvement in cyber incidents, catalogued by Google's Threat Intelligence Group, to curate seven representative attack chain archetypes.<n>We report on AI's potential to amplify offensive capabilities across specific attack stages, and offer recommendations for prioritizing defenses.
arXiv Detail & Related papers (2025-03-14T23:05:02Z) - G$^2$uardFL: Safeguarding Federated Learning Against Backdoor Attacks
through Attributed Client Graph Clustering [116.4277292854053]
Federated Learning (FL) offers collaborative model training without data sharing.
FL is vulnerable to backdoor attacks, where poisoned model weights lead to compromised system integrity.
We present G$2$uardFL, a protective framework that reinterprets the identification of malicious clients as an attributed graph clustering problem.
arXiv Detail & Related papers (2023-06-08T07:15:04Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.