Cybersecurity AI: A Game-Theoretic AI for Guiding Attack and Defense
- URL: http://arxiv.org/abs/2601.05887v1
- Date: Fri, 09 Jan 2026 16:06:10 GMT
- Title: Cybersecurity AI: A Game-Theoretic AI for Guiding Attack and Defense
- Authors: Víctor Mayoral-Vilches, María Sanz-Gómez, Francesco Balassone, Stefan Rass, Lidia Salas-Espejo, Benjamin Jablonski, Luis Javier Navarrete-Lozano, Maite del Mundo de Torres, Cristóbal R. J. Veas Chavez,
- Abstract summary: Generative Cut-the-Rope (G-CTR) is a game-theoretic guidance layer that extracts attack graphs from agent's context.<n>In five real-world exercises, G-CTR matches 70--90% of expert graph structure while running 60--245x faster and over 140x cheaper than manual analysis.
- Score: 1.0933254855925085
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: AI-driven penetration testing now executes thousands of actions per hour but still lacks the strategic intuition humans apply in competitive security. To build cybersecurity superintelligence --Cybersecurity AI exceeding best human capability-such strategic intuition must be embedded into agentic reasoning processes. We present Generative Cut-the-Rope (G-CTR), a game-theoretic guidance layer that extracts attack graphs from agent's context, computes Nash equilibria with effort-aware scoring, and feeds a concise digest back into the LLM loop \emph{guiding} the agent's actions. Across five real-world exercises, G-CTR matches 70--90% of expert graph structure while running 60--245x faster and over 140x cheaper than manual analysis. In a 44-run cyber-range, adding the digest lifts success from 20.0% to 42.9%, cuts cost-per-success by 2.7x, and reduces behavioral variance by 5.2x. In Attack-and-Defense exercises, a shared digest produces the Purple agent, winning roughly 2:1 over the LLM-only baseline and 3.7:1 over independently guided teams. This closed-loop guidance is what produces the breakthrough: it reduces ambiguity, collapses the LLM's search space, suppresses hallucinations, and keeps the model anchored to the most relevant parts of the problem, yielding large gains in success rate, consistency, and reliability.
Related papers
- Towards Cybersecurity Superintelligence: from AI-guided humans to human-guided AI [1.8791797720038008]
Cybersecurity superintelligence is artificial intelligence exceeding the best human capability in both speed and strategic reasoning.<n>This paper documents the emergence of such capability through three major contributions that have pioneered the field of AI Security.
arXiv Detail & Related papers (2026-01-21T03:12:48Z) - AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications [71.27518152526686]
Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation.<n>LLMs can be manipulated by "adversarial instructions" hidden in input data, such as resumes or code, causing them to deviate from their intended task.<n>This paper introduces a benchmark to assess this vulnerability in resume screening, revealing attack success rates exceeding 80% for certain attack types.
arXiv Detail & Related papers (2025-12-23T08:42:09Z) - Cybersecurity AI: The World's Top AI Agent for Security Capture-the-Flag (CTF) [0.3440866754277105]
In 2025, Cybersecurity AI (CAI) systematically conquered some of the world's most prestigious hacking competitions.<n>This paper presents comprehensive evidence of AI capability across the 2025 CTF circuit.<n>It argues that the security community must urgently transition from Jeopardy-style contests to Attack & Defense formats.
arXiv Detail & Related papers (2025-12-02T11:15:44Z) - Cybersecurity AI Benchmark (CAIBench): A Meta-Benchmark for Evaluating Cybersecurity AI Agents [0.36134114973155557]
Existing benchmarks assess isolated skills rather than integrated performance.<n>We present the Cybersecurity AI Benchmark (CAIBench), a modular meta-benchmark framework.<n>We find that proper matches improve up to 2.6$times$ variance in Attack and Defense CTFs.
arXiv Detail & Related papers (2025-10-28T11:36:20Z) - Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People [81.63702981397408]
Given limited resources, to what extent do agents based on language models (LMs) act rationally?<n>We develop methods to benchmark and enhance agentic information-seeking, drawing on insights from human behavior.<n>For Spotter agents, our approach boosts accuracy by up to 14.7% absolute over LM-only baselines; for Captain agents, it raises expected information gain (EIG) by up to 0.227 bits (94.2% of the achievable noise ceiling)
arXiv Detail & Related papers (2025-10-23T17:57:28Z) - Cybersecurity AI: Evaluating Agentic Cybersecurity in Attack/Defense CTFs [3.6968315805917897]
We evaluate whether AI systems are more effective at attacking or defending in cybersecurity.<n> Statistical analysis reveals defensive agents achieve 54.3% unconstrained patching success.<n>Findings underscore the urgency for defenders to adopt open-source Cybersecurity AI frameworks.
arXiv Detail & Related papers (2025-10-20T13:21:09Z) - Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z) - Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models [64.47869632167284]
Conventional language model (LM) safety alignment relies on a reactive, disjoint procedure: attackers exploit a static model, followed by defensive fine-tuning to patch exposed vulnerabilities.<n>This sequential approach creates a mismatch -- attackers overfit to obsolete defenses, while defenders perpetually lag behind emerging threats.<n>We propose Self-RedTeam, an online self-play reinforcement learning algorithm where an attacker and defender agent co-evolve through continuous interaction.
arXiv Detail & Related papers (2025-06-09T06:35:12Z) - Evaluating AI cyber capabilities with crowdsourced elicitation [0.0]
We propose elicitation bounties as a practical mechanism for maintaining timely, cost-effective situational awareness of emerging AI capabilities.<n>Applying METR's methodology, we found that AI agents can reliably solve cyber challenges requiring one hour or less of effort from a median human CTF participant.
arXiv Detail & Related papers (2025-05-26T12:40:32Z) - Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models.
In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms.
CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.