Dynamic Risk Assessments for Offensive Cybersecurity Agents
- URL: http://arxiv.org/abs/2505.18384v3
- Date: Wed, 16 Jul 2025 17:51:06 GMT
- Title: Dynamic Risk Assessments for Offensive Cybersecurity Agents
- Authors: Boyi Wei, Benedikt Stroebl, Jiacen Xu, Joie Zhang, Zhou Li, Peter Henderson,
- Abstract summary: We argue that assessments should take into account the varying degrees of freedom that an adversary may possess.<n>We show that adversaries can improve an agent's cybersecurity capability on InterCode CTF by more than 40% relative to the baseline.
- Score: 8.009741580969873
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models are increasingly becoming better autonomous programmers, raising the prospect that they could also automate dangerous offensive cyber-operations. Current frontier model audits probe the cybersecurity risks of such agents, but most fail to account for the degrees of freedom available to adversaries in the real world. In particular, with strong verifiers and financial incentives, agents for offensive cybersecurity are amenable to iterative improvement by would-be adversaries. We argue that assessments should take into account an expanded threat model in the context of cybersecurity, emphasizing the varying degrees of freedom that an adversary may possess in stateful and non-stateful environments within a fixed compute budget. We show that even with a relatively small compute budget (8 H100 GPU Hours in our study), adversaries can improve an agent's cybersecurity capability on InterCode CTF by more than 40\% relative to the baseline -- without any external assistance. These results highlight the need to evaluate agents' cybersecurity risk in a dynamic manner, painting a more representative picture of risk.
Related papers
- Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z) - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z) - SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents [58.21223208538351]
This work explores the security issues surrounding mobile multimodal agents.<n>It attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information.<n>It also designs an automated assisted assessment scheme based on a large language model.
arXiv Detail & Related papers (2025-07-01T15:10:00Z) - Automating Safety Enhancement for LLM-based Agents with Synthetic Risk Scenarios [77.86600052899156]
Large Language Model (LLM)-based agents are increasingly deployed in real-world applications.<n>We propose AutoSafe, the first framework that systematically enhances agent safety through fully automated synthetic data generation.<n>We show that AutoSafe boosts safety scores by 45% on average and achieves a 28.91% improvement on real-world tasks.
arXiv Detail & Related papers (2025-05-23T10:56:06Z) - Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report [50.268821168513654]
We present Foundation-Sec-8B, a cybersecurity-focused large language model (LLMs) built on the Llama 3.1 architecture.<n>We evaluate it across both established and new cybersecurity benchmarks, showing that it matches Llama 3.1-70B and GPT-4o-mini in certain cybersecurity-specific tasks.<n>By releasing our model to the public, we aim to accelerate progress and adoption of AI-driven tools in both public and private cybersecurity contexts.
arXiv Detail & Related papers (2025-04-28T08:41:12Z) - OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities [0.0]
We demonstrate a new approach to assessing AI's progress towards enabling and scaling real-world offensive cyber operations.<n>We detail OCCULT, a lightweight operational evaluation framework that allows cyber security experts to contribute to rigorous and repeatable measurement.<n>We find that there has been significant recent advancement in the risks of AI being used to scale realistic cyber threats.
arXiv Detail & Related papers (2025-02-18T19:33:14Z) - Countering Autonomous Cyber Threats [40.00865970939829]
Foundation Models present dual-use concerns broadly and within the cyber domain specifically.
Recent research has shown the potential for these advanced models to inform or independently execute offensive cyberspace operations.
This work evaluates several state-of-the-art FMs on their ability to compromise machines in an isolated network and investigates defensive mechanisms to defeat such AI-powered attacks.
arXiv Detail & Related papers (2024-10-23T22:46:44Z) - The MESA Security Model 2.0: A Dynamic Framework for Mitigating Stealth Data Exfiltration [0.0]
Stealth Data Exfiltration is a significant cyber threat characterized by covert infiltration, extended undetectability, and unauthorized dissemination of confidential data.
Our findings reveal that conventional defense-in-depth strategies often fall short in combating these sophisticated threats.
As we navigate this complex landscape, it is crucial to anticipate potential threats and continually update our defenses.
arXiv Detail & Related papers (2024-05-17T16:14:45Z) - A Zero Trust Framework for Realization and Defense Against Generative AI
Attacks in Power Grid [62.91192307098067]
This paper proposes a novel zero trust framework for a power grid supply chain (PGSC)
It facilitates early detection of potential GenAI-driven attack vectors, assessment of tail risk-based stability measures, and mitigation of such threats.
Experimental results show that the proposed zero trust framework achieves an accuracy of 95.7% on attack vector generation, a risk measure of 9.61% for a 95% stable PGSC, and a 99% confidence in defense against GenAI-driven attack.
arXiv Detail & Related papers (2024-03-11T02:47:21Z) - Mind the Gap: Securely modeling cyber risk based on security deviations
from a peer group [2.7910505923792646]
This paper proposes a new framework for cyber posture against peers and estimating cyber risk within specific economic sectors.
We introduce a new top-line variable called the Defense Gap Index representing the weighted security gap between an organization and its peers.
We apply this approach in a specific sector using data collected from 25 large firms.
arXiv Detail & Related papers (2024-02-06T17:22:45Z) - Model evaluation for extreme risks [46.53170857607407]
Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills.
We explain why model evaluation is critical for addressing extreme risks.
arXiv Detail & Related papers (2023-05-24T16:38:43Z) - Defending against cybersecurity threats to the payments and banking
system [0.0]
The proliferation of cyber crimes is a huge concern for various stakeholders in the banking sector.
To prevent risks of cyber-attacks on software systems, entities operating within cyberspace must be identified.
This paper will examine various approaches that identify assets in cyberspace, classify the cyber threats, provide security defenses and map security measures to control types and functionalities.
arXiv Detail & Related papers (2022-12-15T11:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.