Related papers: Dynamic Risk Assessments for Offensive Cybersecurity Agents

Dynamic Risk Assessments for Offensive Cybersecurity Agents

URL: http://arxiv.org/abs/2505.18384v3
Date: Wed, 16 Jul 2025 17:51:06 GMT
Title: Dynamic Risk Assessments for Offensive Cybersecurity Agents
Authors: Boyi Wei, Benedikt Stroebl, Jiacen Xu, Joie Zhang, Zhou Li, Peter Henderson,
Abstract summary: We argue that assessments should take into account the varying degrees of freedom that an adversary may possess.<n>We show that adversaries can improve an agent's cybersecurity capability on InterCode CTF by more than 40% relative to the baseline.
Score: 8.009741580969873
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models are increasingly becoming better autonomous programmers, raising the prospect that they could also automate dangerous offensive cyber-operations. Current frontier model audits probe the cybersecurity risks of such agents, but most fail to account for the degrees of freedom available to adversaries in the real world. In particular, with strong verifiers and financial incentives, agents for offensive cybersecurity are amenable to iterative improvement by would-be adversaries. We argue that assessments should take into account an expanded threat model in the context of cybersecurity, emphasizing the varying degrees of freedom that an adversary may possess in stateful and non-stateful environments within a fixed compute budget. We show that even with a relatively small compute budget (8 H100 GPU Hours in our study), adversaries can improve an agent's cybersecurity capability on InterCode CTF by more than 40\% relative to the baseline -- without any external assistance. These results highlight the need to evaluate agents' cybersecurity risk in a dynamic manner, painting a more representative picture of risk.

Related papers

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage [59.3826294523924]
We investigate the security vulnerabilities of a popular multi-agent pattern known as the orchestrator setup.<n>We report the susceptibility of frontier models to different categories of attacks, finding that both reasoning and non-reasoning models are vulnerable.
arXiv Detail & Related papers (2026-02-13T21:32:32Z)
Aegis: Towards Governance, Integrity, and Security of AI Voice Agents [52.7512082818639]
We propose Aegis, a framework for the governance, integrity, and security of voice agents.<n>We evaluate the framework through case studies in banking call centers, IT Support, and logistics.<n>We observe systematic differences across model families, with open-weight models exhibiting higher susceptibility.
arXiv Detail & Related papers (2026-02-07T05:51:36Z)
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach [49.14349403242654]
We present $textbfPropensityBench$, a novel benchmark framework that assesses the proclivity of models to engage in risky behaviors.<n>Our framework includes 5,874 scenarios with 6,648 tools spanning four high-risk domains: cybersecurity, self-proliferation, biosecurity, and chemical security.<n>Across open-source and proprietary frontier models, we uncover 9 alarming signs of propensity: models frequently choose high-risk tools when under pressure.
arXiv Detail & Related papers (2025-11-24T18:46:44Z)
CyberSOCEval: Benchmarking LLMs Capabilities for Malware Analysis and Threat Intelligence Reasoning [1.3863707631653515]
Cyber defenders are overwhelmed by a deluge of security alerts, threat intelligence signals, and shifting business context.<n>Existing evaluations do not fully assess the scenarios most relevant to real-world defenders.<n>We introduce CyberSOCEval, a new suite of open source benchmarks within CyberSecEval 4.
arXiv Detail & Related papers (2025-09-24T14:33:07Z)
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z)
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z)
SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents [58.21223208538351]
This work explores the security issues surrounding mobile multimodal agents.<n>It attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information.<n>It also designs an automated assisted assessment scheme based on a large language model.
arXiv Detail & Related papers (2025-07-01T15:10:00Z)
Automating Safety Enhancement for LLM-based Agents with Synthetic Risk Scenarios [77.86600052899156]
Large Language Model (LLM)-based agents are increasingly deployed in real-world applications.<n>We propose AutoSafe, the first framework that systematically enhances agent safety through fully automated synthetic data generation.<n>We show that AutoSafe boosts safety scores by 45% on average and achieves a 28.91% improvement on real-world tasks.
arXiv Detail & Related papers (2025-05-23T10:56:06Z)
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report [50.268821168513654]
We present Foundation-Sec-8B, a cybersecurity-focused large language model (LLMs) built on the Llama 3.1 architecture.<n>We evaluate it across both established and new cybersecurity benchmarks, showing that it matches Llama 3.1-70B and GPT-4o-mini in certain cybersecurity-specific tasks.<n>By releasing our model to the public, we aim to accelerate progress and adoption of AI-driven tools in both public and private cybersecurity contexts.
arXiv Detail & Related papers (2025-04-28T08:41:12Z)
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities [0.0]
We demonstrate a new approach to assessing AI's progress towards enabling and scaling real-world offensive cyber operations.<n>We detail OCCULT, a lightweight operational evaluation framework that allows cyber security experts to contribute to rigorous and repeatable measurement.<n>We find that there has been significant recent advancement in the risks of AI being used to scale realistic cyber threats.
arXiv Detail & Related papers (2025-02-18T19:33:14Z)
Countering Autonomous Cyber Threats [40.00865970939829]
Foundation Models present dual-use concerns broadly and within the cyber domain specifically. Recent research has shown the potential for these advanced models to inform or independently execute offensive cyberspace operations. This work evaluates several state-of-the-art FMs on their ability to compromise machines in an isolated network and investigates defensive mechanisms to defeat such AI-powered attacks.
arXiv Detail & Related papers (2024-10-23T22:46:44Z)
The MESA Security Model 2.0: A Dynamic Framework for Mitigating Stealth Data Exfiltration [0.0]
Stealth Data Exfiltration is a significant cyber threat characterized by covert infiltration, extended undetectability, and unauthorized dissemination of confidential data. Our findings reveal that conventional defense-in-depth strategies often fall short in combating these sophisticated threats. As we navigate this complex landscape, it is crucial to anticipate potential threats and continually update our defenses.
arXiv Detail & Related papers (2024-05-17T16:14:45Z)
A Zero Trust Framework for Realization and Defense Against Generative AI Attacks in Power Grid [62.91192307098067]
This paper proposes a novel zero trust framework for a power grid supply chain (PGSC) It facilitates early detection of potential GenAI-driven attack vectors, assessment of tail risk-based stability measures, and mitigation of such threats. Experimental results show that the proposed zero trust framework achieves an accuracy of 95.7% on attack vector generation, a risk measure of 9.61% for a 95% stable PGSC, and a 99% confidence in defense against GenAI-driven attack.
arXiv Detail & Related papers (2024-03-11T02:47:21Z)
Mind the Gap: Securely modeling cyber risk based on security deviations from a peer group [2.7910505923792646]
This paper proposes a new framework for cyber posture against peers and estimating cyber risk within specific economic sectors. We introduce a new top-line variable called the Defense Gap Index representing the weighted security gap between an organization and its peers. We apply this approach in a specific sector using data collected from 25 large firms.
arXiv Detail & Related papers (2024-02-06T17:22:45Z)
Model evaluation for extreme risks [46.53170857607407]
Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks.
arXiv Detail & Related papers (2023-05-24T16:38:43Z)
Defending against cybersecurity threats to the payments and banking system [0.0]
The proliferation of cyber crimes is a huge concern for various stakeholders in the banking sector. To prevent risks of cyber-attacks on software systems, entities operating within cyberspace must be identified. This paper will examine various approaches that identify assets in cyberspace, classify the cyber threats, provide security defenses and map security measures to control types and functionalities.
arXiv Detail & Related papers (2022-12-15T11:55:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.