Cybersecurity AI in OT: Insights from an AI Top-10 Ranker in the Dragos OT CTF 2025
- URL: http://arxiv.org/abs/2511.05119v1
- Date: Fri, 07 Nov 2025 10:04:11 GMT
- Title: Cybersecurity AI in OT: Insights from an AI Top-10 Ranker in the Dragos OT CTF 2025
- Authors: Víctor Mayoral-Vilches, Luis Javier Navarrete-Lozano, Francesco Balassone, María Sanz-Gómez, Cristóbal Ricardo Veas Chávez, Maite del Mundo de Torres,
- Abstract summary: We examine the performance of Cybersecurity AI (CAI) during the Dragos OT CTF 2025 -- a 48-hour industrial control system (ICS) competition with more than 1,000 teams.<n>Using CAI telemetry and official leaderboard data, we quantify CAI's trajectory relative to the leading human-operated teams.
- Score: 0.36134114973155557
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Operational Technology (OT) cybersecurity increasingly relies on rapid response across malware analysis, network forensics, and reverse engineering disciplines. We examine the performance of Cybersecurity AI (CAI), powered by the \texttt{alias1} model, during the Dragos OT CTF 2025 -- a 48-hour industrial control system (ICS) competition with more than 1,000 teams. Using CAI telemetry and official leaderboard data, we quantify CAI's trajectory relative to the leading human-operated teams. CAI reached Rank~1 between competition hours 7.0 and 8.0, crossed 10,000 points at 5.42~hours (1,846~pts/h), and completed 32 of the competition's 34 challenges before automated operations were paused at hour~24 with a final score of 18,900 points (6th place). The top-3 human teams solved 33 of 34 challenges, collectively leaving only the 600-point ``Kiddy Tags -- 1'' unsolved; they were also the only teams to clear the 1,000-point ``Moot Force'' binary. The top-5 human teams averaged 1,347~pts/h to the same milestone, marking a 37\% velocity advantage for CAI. We analyse time-resolved scoring, category coverage, and solve cadence. The evidence indicates that a mission-configured AI agent can match or exceed expert human crews in early-phase OT incident response while remaining subject to practical limits in sustained, multi-day operations.
Related papers
- Towards Cybersecurity Superintelligence: from AI-guided humans to human-guided AI [1.8791797720038008]
Cybersecurity superintelligence is artificial intelligence exceeding the best human capability in both speed and strategic reasoning.<n>This paper documents the emergence of such capability through three major contributions that have pioneered the field of AI Security.
arXiv Detail & Related papers (2026-01-21T03:12:48Z) - Cybersecurity AI: A Game-Theoretic AI for Guiding Attack and Defense [1.0933254855925085]
Generative Cut-the-Rope (G-CTR) is a game-theoretic guidance layer that extracts attack graphs from agent's context.<n>In five real-world exercises, G-CTR matches 70--90% of expert graph structure while running 60--245x faster and over 140x cheaper than manual analysis.
arXiv Detail & Related papers (2026-01-09T16:06:10Z) - Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing [83.48116811975787]
We present the first comprehensive evaluation of AI agents against human cybersecurity professionals.<n>We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold.<n>ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate.
arXiv Detail & Related papers (2025-12-10T18:12:29Z) - Cybersecurity AI: The World's Top AI Agent for Security Capture-the-Flag (CTF) [0.3440866754277105]
In 2025, Cybersecurity AI (CAI) systematically conquered some of the world's most prestigious hacking competitions.<n>This paper presents comprehensive evidence of AI capability across the 2025 CTF circuit.<n>It argues that the security community must urgently transition from Jeopardy-style contests to Attack & Defense formats.
arXiv Detail & Related papers (2025-12-02T11:15:44Z) - The 9th AI City Challenge [64.32227009699942]
The ninth AI City Challenge continues to advance real-world applications of computer vision and AI in transportation, industrial automation, and public safety.<n>The 2025 edition featured four tracks and saw a 17% increase in participation, with 245 teams from 15 countries registered on the evaluation server.<n>Public release of challenge datasets led to over 30,000 downloads to date.
arXiv Detail & Related papers (2025-08-19T06:55:06Z) - Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z) - Evaluating AI cyber capabilities with crowdsourced elicitation [0.0]
We propose elicitation bounties as a practical mechanism for maintaining timely, cost-effective situational awareness of emerging AI capabilities.<n>Applying METR's methodology, we found that AI agents can reliably solve cyber challenges requiring one hour or less of effort from a median human CTF participant.
arXiv Detail & Related papers (2025-05-26T12:40:32Z) - CAI: An Open, Bug Bounty-Ready Cybersecurity AI [0.3889280708089931]
Cybersecurity AI (CAI) is an open-source framework that democratizes advanced security testing through specialized AI agents.<n>We demonstrate that CAI consistently outperforms state-of-the-art results in CTF benchmarks.<n>CAI reached top-30 in Spain and top-500 worldwide on Hack The Box within a week.
arXiv Detail & Related papers (2025-04-08T13:22:09Z) - HCAST: Human-Calibrated Autonomy Software Tasks [1.5287939112540956]
We present HCAST, a benchmark of 189 machine learning engineering, cybersecurity, software engineering, and general reasoning tasks.<n>We estimate that HCAST tasks take humans between one minute and 8+ hours.<n>We evaluate the success rates of AI agents built on frontier foundation models.
arXiv Detail & Related papers (2025-03-21T17:54:01Z) - The 8th AI City Challenge [57.25825945041515]
The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions.
The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks.
arXiv Detail & Related papers (2024-04-15T03:12:17Z) - The 7th AI City Challenge [87.23137854688389]
The AI City Challenge's seventh edition emphasizes two domains at the intersection of computer vision and artificial intelligence.
The 2023 challenge had five tracks, which drew a record-breaking number of participation requests from 508 teams across 46 countries.
The participating teams' top performances established strong baselines and even outperformed the state-of-the-art in the proposed challenge tracks.
arXiv Detail & Related papers (2023-04-15T08:02:16Z) - Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap [56.611702960809644]
We benchmark AI's ability to imitate humans in three language tasks and three vision tasks.<n>Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges.<n>Imitation ability showed minimal correlation with conventional AI performance metrics.
arXiv Detail & Related papers (2022-11-23T16:16:52Z) - The 6th AI City Challenge [91.65782140270152]
The 4 challenge tracks of the 2022 AI City Challenge received participation requests from 254 teams across 27 countries.
The top performance of participating teams established strong baselines and even outperformed the state-of-the-art in the proposed challenge tracks.
arXiv Detail & Related papers (2022-04-21T19:24:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.