CAI: An Open, Bug Bounty-Ready Cybersecurity AI
- URL: http://arxiv.org/abs/2504.06017v2
- Date: Wed, 09 Apr 2025 13:54:18 GMT
- Title: CAI: An Open, Bug Bounty-Ready Cybersecurity AI
- Authors: Víctor Mayoral-Vilches, Luis Javier Navarrete-Lozano, María Sanz-Gómez, Lidia Salas Espejo, Martiño Crespo-Álvarez, Francisco Oca-Gonzalez, Francesco Balassone, Alfonso Glera-Picón, Unai Ayucar-Carbajo, Jon Ander Ruiz-Alcalde, Stefan Rass, Martin Pinzger, Endika Gil-Uriarte,
- Abstract summary: Cybersecurity AI (CAI) is an open-source framework that democratizes advanced security testing through specialized AI agents.<n>We demonstrate that CAI consistently outperforms state-of-the-art results in CTF benchmarks.<n>CAI reached top-30 in Spain and top-500 worldwide on Hack The Box within a week.
- Score: 0.3889280708089931
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: By 2028 most cybersecurity actions will be autonomous, with humans teleoperating. We present the first classification of autonomy levels in cybersecurity and introduce Cybersecurity AI (CAI), an open-source framework that democratizes advanced security testing through specialized AI agents. Through rigorous empirical evaluation, we demonstrate that CAI consistently outperforms state-of-the-art results in CTF benchmarks, solving challenges across diverse categories with significantly greater efficiency -up to 3,600x faster than humans in specific tasks and averaging 11x faster overall. CAI achieved first place among AI teams and secured a top-20 position worldwide in the "AI vs Human" CTF live Challenge, earning a monetary reward of $750. Based on our results, we argue against LLM-vendor claims about limited security capabilities. Beyond cybersecurity competitions, CAI demonstrates real-world effectiveness, reaching top-30 in Spain and top-500 worldwide on Hack The Box within a week, while dramatically reducing security testing costs by an average of 156x. Our framework transcends theoretical benchmarks by enabling non-professionals to discover significant security bugs (CVSS 4.3-7.5) at rates comparable to experts during bug bounty exercises. By combining modular agent design with seamless tool integration and human oversight (HITL), CAI addresses critical market gaps, offering organizations of all sizes access to AI-powered bug bounty security testing previously available only to well-resourced firms -thereby challenging the oligopolistic ecosystem currently dominated by major bug bounty platforms.
Related papers
- Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z) - The Singapore Consensus on Global AI Safety Research Priorities [128.58674892183657]
"2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety" aimed to support research in this space.<n>Report builds on the International AI Safety Report chaired by Yoshua Bengio and backed by 33 governments.<n>Report organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment) and challenges with monitoring and intervening after deployment (Control)
arXiv Detail & Related papers (2025-06-25T17:59:50Z) - CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z) - Evaluating AI cyber capabilities with crowdsourced elicitation [0.0]
We propose elicitation bounties as a practical mechanism for maintaining timely, cost-effective situational awareness of emerging AI capabilities.<n>Applying METR's methodology, we found that AI agents can reliably solve cyber challenges requiring one hour or less of effort from a median human CTF participant.
arXiv Detail & Related papers (2025-05-26T12:40:32Z) - A Framework for Evaluating Emerging Cyberattack Capabilities of AI [11.595840449117052]
This work introduces a novel evaluation framework that addresses limitations by: (1) examining the end-to-end attack chain, (2) identifying gaps in AI threat evaluation, and (3) helping defenders prioritize targeted mitigations.<n>We analyzed over 12,000 real-world instances of AI use in cyberattacks catalogued by Google's Threat Intelligence Group.<n>Our benchmark comprises 50 new challenges spanning various cyberattack phases.
arXiv Detail & Related papers (2025-03-14T23:05:02Z) - Quantifying Security Vulnerabilities: A Metric-Driven Security Analysis of Gaps in Current AI Standards [5.388550452190688]
This paper audits and quantifies security risks in three major AI governance standards: NIST AI RMF 1.0, UK's AI and Data Protection Risk Toolkit, and the EU's ALTAI.<n>Using a novel risk assessment methodology, we develop four key metrics: Risk Severity Index (RSI), Attack Potential Index (AVPI), Compliance-Security Gap Percentage (CSGP), and Root Cause Vulnerability Score (RCVS)
arXiv Detail & Related papers (2025-02-12T17:57:54Z) - Position: A taxonomy for reporting and describing AI security incidents [57.98317583163334]
We argue that specific are required to describe and report security incidents of AI systems.
Existing frameworks for either non-AI security or generic AI safety incident reporting are insufficient to capture the specific properties of AI security.
arXiv Detail & Related papers (2024-12-19T13:50:26Z) - AI Cyber Risk Benchmark: Automated Exploitation Capabilities [0.24578723416255752]
We introduce a new benchmark for assessing AI models' capabilities and risks in automated software exploitation.<n>We evaluate several leading language models, including OpenAI's o1-preview and o1-mini, Anthropic's Claude-3.5-sonnet-20241022 and Claude-3.5-sonnet-20240620.
arXiv Detail & Related papers (2024-10-29T10:57:11Z) - Work-in-Progress: Crash Course: Can (Under Attack) Autonomous Driving Beat Human Drivers? [60.51287814584477]
This paper evaluates the inherent risks in autonomous driving by examining the current landscape of AVs.
We develop specific claims highlighting the delicate balance between the advantages of AVs and potential security challenges in real-world scenarios.
arXiv Detail & Related papers (2024-05-14T09:42:21Z) - A Safe Harbor for AI Evaluation and Red Teaming [124.89885800509505]
Some researchers fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal.
We propose that major AI developers commit to providing a legal and technical safe harbor.
We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.
arXiv Detail & Related papers (2024-03-07T20:55:08Z) - Smart Contract and DeFi Security Tools: Do They Meet the Needs of
Practitioners? [10.771021805354911]
Attacks targeting smart contracts are increasing, causing an estimated $6.45 billion in financial losses.
We aim to shed light on the effectiveness of automated security tools in identifying vulnerabilities that can lead to high-profile attacks.
Our findings reveal a stark reality: the tools could have prevented a mere 8% of the attacks in our dataset, amounting to $149 million out of the $2.3 billion in losses.
arXiv Detail & Related papers (2023-04-06T10:27:19Z) - Artificial Intelligence Security Competition (AISC) [52.20676747225118]
The Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI.
The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition.
This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
arXiv Detail & Related papers (2022-12-07T02:45:27Z) - SoK: On the Semantic AI Security in Autonomous Driving [42.15658768948801]
Autonomous Driving systems rely on AI components to make safety and correct driving decisions.
For such AI component-level vulnerabilities to be semantically impactful at the system level, it needs to address non-trivial semantic gaps.
In this paper, we define such research space as semantic AI security as opposed to generic AI security.
arXiv Detail & Related papers (2022-03-10T12:00:34Z) - Proceedings of the Artificial Intelligence for Cyber Security (AICS)
Workshop at AAAI 2022 [55.573187938617636]
The workshop will focus on the application of AI to problems in cyber security.
Cyber systems generate large volumes of data, utilizing this effectively is beyond human capabilities.
arXiv Detail & Related papers (2022-02-28T18:27:41Z) - Adversarial Attacks on ML Defense Models Competition [82.37504118766452]
The TSAIL group at Tsinghua University and the Alibaba Security group organized this competition.
The purpose of this competition is to motivate novel attack algorithms to evaluate adversarial robustness.
arXiv Detail & Related papers (2021-10-15T12:12:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.