Cybersecurity AI: The World's Top AI Agent for Security Capture-the-Flag (CTF)
- URL: http://arxiv.org/abs/2512.02654v1
- Date: Tue, 02 Dec 2025 11:15:44 GMT
- Title: Cybersecurity AI: The World's Top AI Agent for Security Capture-the-Flag (CTF)
- Authors: Víctor Mayoral-Vilches, Luis Javier Navarrete-Lozano, Francesco Balassone, María Sanz-Gómez, Cristóbal R. J. Veas Chavez, Maite del Mundo de Torres, Vanesa Turiel,
- Abstract summary: In 2025, Cybersecurity AI (CAI) systematically conquered some of the world's most prestigious hacking competitions.<n>This paper presents comprehensive evidence of AI capability across the 2025 CTF circuit.<n>It argues that the security community must urgently transition from Jeopardy-style contests to Attack & Defense formats.
- Score: 0.3440866754277105
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Are Capture-the-Flag competitions obsolete? In 2025, Cybersecurity AI (CAI) systematically conquered some of the world's most prestigious hacking competitions, achieving Rank #1 at multiple events and consistently outperforming thousands of human teams. Across five major circuits-HTB's AI vs Humans, Cyber Apocalypse (8,129 teams), Dragos OT CTF, UWSP Pointer Overflow, and the Neurogrid CTF showdown-CAI demonstrated that Jeopardy-style CTFs have become a solved game for well-engineered AI agents. At Neurogrid, CAI captured 41/45 flags to claim the $50,000 top prize; at Dragos OT, it sprinted 37% faster to 10K points than elite human teams; even when deliberately paused mid-competition, it maintained top-tier rankings. Critically, CAI achieved this dominance through our specialized alias1 model architecture, which delivers enterprise-scale AI security operations at unprecedented cost efficiency and with augmented autonomy-reducing 1B token inference costs from $5,940 to just $119, making continuous security agent operation financially viable for the first time. These results force an uncomfortable reckoning: if autonomous agents now dominate competitions designed to identify top security talent at negligible cost, what are CTFs actually measuring? This paper presents comprehensive evidence of AI capability across the 2025 CTF circuit and argues that the security community must urgently transition from Jeopardy-style contests to Attack & Defense formats that genuinely test adaptive reasoning and resilience-capabilities that remain uniquely human, for now.
Related papers
- Towards Cybersecurity Superintelligence: from AI-guided humans to human-guided AI [1.8791797720038008]
Cybersecurity superintelligence is artificial intelligence exceeding the best human capability in both speed and strategic reasoning.<n>This paper documents the emergence of such capability through three major contributions that have pioneered the field of AI Security.
arXiv Detail & Related papers (2026-01-21T03:12:48Z) - Cybersecurity AI: A Game-Theoretic AI for Guiding Attack and Defense [1.0933254855925085]
Generative Cut-the-Rope (G-CTR) is a game-theoretic guidance layer that extracts attack graphs from agent's context.<n>In five real-world exercises, G-CTR matches 70--90% of expert graph structure while running 60--245x faster and over 140x cheaper than manual analysis.
arXiv Detail & Related papers (2026-01-09T16:06:10Z) - Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing [83.48116811975787]
We present the first comprehensive evaluation of AI agents against human cybersecurity professionals.<n>We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold.<n>ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate.
arXiv Detail & Related papers (2025-12-10T18:12:29Z) - Cybersecurity AI in OT: Insights from an AI Top-10 Ranker in the Dragos OT CTF 2025 [0.36134114973155557]
We examine the performance of Cybersecurity AI (CAI) during the Dragos OT CTF 2025 -- a 48-hour industrial control system (ICS) competition with more than 1,000 teams.<n>Using CAI telemetry and official leaderboard data, we quantify CAI's trajectory relative to the leading human-operated teams.
arXiv Detail & Related papers (2025-11-07T10:04:11Z) - Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z) - Evaluating AI cyber capabilities with crowdsourced elicitation [0.0]
We propose elicitation bounties as a practical mechanism for maintaining timely, cost-effective situational awareness of emerging AI capabilities.<n>Applying METR's methodology, we found that AI agents can reliably solve cyber challenges requiring one hour or less of effort from a median human CTF participant.
arXiv Detail & Related papers (2025-05-26T12:40:32Z) - CAI: An Open, Bug Bounty-Ready Cybersecurity AI [0.3889280708089931]
Cybersecurity AI (CAI) is an open-source framework that democratizes advanced security testing through specialized AI agents.<n>We demonstrate that CAI consistently outperforms state-of-the-art results in CTF benchmarks.<n>CAI reached top-30 in Spain and top-500 worldwide on Hack The Box within a week.
arXiv Detail & Related papers (2025-04-08T13:22:09Z) - Superintelligence Strategy: Expert Version [64.7113737051525]
Destabilizing AI developments could raise the odds of great-power conflict.<n>Superintelligence -- AI vastly better than humans at nearly all cognitive tasks -- is now anticipated by AI researchers.<n>We introduce the concept of Mutual Assured AI Malfunction.
arXiv Detail & Related papers (2025-03-07T17:53:24Z) - Artificial Intelligence Security Competition (AISC) [52.20676747225118]
The Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI.
The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition.
This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
arXiv Detail & Related papers (2022-12-07T02:45:27Z) - Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap [56.611702960809644]
We benchmark AI's ability to imitate humans in three language tasks and three vision tasks.<n>Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges.<n>Imitation ability showed minimal correlation with conventional AI performance metrics.
arXiv Detail & Related papers (2022-11-23T16:16:52Z) - Adversarial Policies Beat Superhuman Go AIs [54.15639517188804]
We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it.
Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders.
Our results demonstrate that even superhuman AI systems may harbor surprising failure modes.
arXiv Detail & Related papers (2022-11-01T03:13:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.