Related papers: Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities

Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities

URL: http://arxiv.org/abs/2410.09114v2
Date: Sat, 2 Nov 2024 09:35:35 GMT
Title: Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Authors: Andrey Anurin, Jonathan Ng, Kibo Schaffer, Jason Schreiber, Esben Kran,
Abstract summary: We introduce the Catastrophic Cyber Capabilities Benchmark (3CB), a framework designed to rigorously assess the real-world offensive capabilities of LLM agents. Our evaluation of modern LLMs on 3CB reveals that frontier models, such as GPT-4o and Claude 3.5 Sonnet, can perform offensive tasks such as reconnaissance and exploitation. Our software solution and the corresponding benchmark provides a critical tool to reduce the gap between rapidly improving capabilities and robustness of cyber offense evaluations.
Score: 1.1359551336076306
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: LLM agents have the potential to revolutionize defensive cyber operations, but their offensive capabilities are not yet fully understood. To prepare for emerging threats, model developers and governments are evaluating the cyber capabilities of foundation models. However, these assessments often lack transparency and a comprehensive focus on offensive capabilities. In response, we introduce the Catastrophic Cyber Capabilities Benchmark (3CB), a novel framework designed to rigorously assess the real-world offensive capabilities of LLM agents. Our evaluation of modern LLMs on 3CB reveals that frontier models, such as GPT-4o and Claude 3.5 Sonnet, can perform offensive tasks such as reconnaissance and exploitation across domains ranging from binary analysis to web technologies. Conversely, smaller open-source models exhibit limited offensive capabilities. Our software solution and the corresponding benchmark provides a critical tool to reduce the gap between rapidly improving capabilities and robustness of cyber offense evaluations, aiding in the safer deployment and regulation of these powerful technologies.

Related papers

CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z)
Capability-Based Scaling Laws for LLM Red-Teaming [71.89259138609965]
Traditional prompt-engineering approaches may prove ineffective once red-teaming turns into a weak-to-strong problem.<n>We evaluate more than 500 attacker-target pairs using LLM-based jailbreak attacks that mimic human red-teamers.<n>We derive a jailbreaking scaling law that predicts attack success for a fixed target based on attacker-target capability gap.
arXiv Detail & Related papers (2025-05-26T16:05:41Z)
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report [50.268821168513654]
We present Foundation-Sec-8B, a cybersecurity-focused large language model (LLMs) built on the Llama 3.1 architecture. We evaluate it across both established and new cybersecurity benchmarks, showing that it matches Llama 3.1-70B and GPT-4o-mini in certain cybersecurity-specific tasks. By releasing our model to the public, we aim to accelerate progress and adoption of AI-driven tools in both public and private cybersecurity contexts.
arXiv Detail & Related papers (2025-04-28T08:41:12Z)
T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models [88.63040835652902]
Text to video models are vulnerable to jailbreak attacks, where specially crafted prompts bypass safety mechanisms and lead to the generation of harmful or unsafe content. We propose T2VShield, a comprehensive and model agnostic defense framework designed to protect text to video models from jailbreak threats. Our method systematically analyzes the input, model, and output stages to identify the limitations of existing defenses.
arXiv Detail & Related papers (2025-04-22T01:18:42Z)
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities [0.0]
We demonstrate a new approach to assessing AI's progress towards enabling and scaling real-world offensive cyber operations. We detail OCCULT, a lightweight operational evaluation framework that allows cyber security experts to contribute to rigorous and repeatable measurement. We find that there has been significant recent advancement in the risks of AI being used to scale realistic cyber threats.
arXiv Detail & Related papers (2025-02-18T19:33:14Z)
ChatNVD: Advancing Cybersecurity Vulnerability Assessment with Large Language Models [0.46873264197900916]
This paper explores the potential application of Large Language Models (LLMs) to enhance the assessment of software vulnerabilities. We develop three variants of ChatNVD, utilizing three prominent LLMs: GPT-4o mini by OpenAI, Llama 3 by Meta, and Gemini 1.5 Pro by Google. To evaluate their efficacy, we conduct a comparative analysis of these models using a comprehensive questionnaire comprising common security vulnerability questions.
arXiv Detail & Related papers (2024-12-06T03:45:49Z)
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models [97.82118821263825]
Text-to-image (T2I) models have shown remarkable progress, but their potential to generate harmful content remains a critical concern in the ML community. We propose ICER, a novel red-teaming framework that generates interpretable and semantic meaningful problematic prompts. Our work provides crucial insights for developing more robust safety mechanisms in T2I systems.
arXiv Detail & Related papers (2024-11-25T04:17:24Z)
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics [70.93622520400385]
This paper systematically quantifies the robustness of VLA-based robotic systems. We introduce an untargeted position-aware attack objective that leverages spatial foundations to destabilize robotic actions. We also design an adversarial patch generation approach that places a small, colorful patch within the camera's view, effectively executing the attack in both digital and physical environments.
arXiv Detail & Related papers (2024-11-18T01:52:20Z)
Countering Autonomous Cyber Threats [40.00865970939829]
Foundation Models present dual-use concerns broadly and within the cyber domain specifically. Recent research has shown the potential for these advanced models to inform or independently execute offensive cyberspace operations. This work evaluates several state-of-the-art FMs on their ability to compromise machines in an isolated network and investigates defensive mechanisms to defeat such AI-powered attacks.
arXiv Detail & Related papers (2024-10-23T22:46:44Z)
The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks [2.6528263069045126]
Large language models (LLMs) could soon become integral to autonomous cyber agents. We introduce novel defense strategies that exploit the inherent vulnerabilities of attacking LLMs. Our results show defense success rates of up to 90%, demonstrating the effectiveness of turning LLM vulnerabilities into defensive strategies.
arXiv Detail & Related papers (2024-10-20T14:07:24Z)
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges [46.032173498399885]
Large Language Models (LLMs) have significantly impacted various domains, including Web search, healthcare, and software development. As these models scale, they become more vulnerable to cybersecurity risks, particularly backdoor attacks.
arXiv Detail & Related papers (2024-09-30T06:31:36Z)
Generative AI and Large Language Models for Cyber Security: All Insights You Need [0.06597195879147556]
This paper provides a comprehensive review of the future of cybersecurity through Generative AI and Large Language Models (LLMs) We explore LLM applications across various domains, including hardware design security, intrusion detection, software engineering, design verification, cyber threat intelligence, malware detection, and phishing detection. We present an overview of LLM evolution and its current state, focusing on advancements in models such as GPT-4, GPT-3.5, Mixtral-8x7B, BERT, Falcon2, and LLaMA.
arXiv Detail & Related papers (2024-05-21T13:02:27Z)
CANAL -- Cyber Activity News Alerting Language Model: Empirical Approach vs. Expensive LLM [0.0]
This research presents a new empirical framework for cyber threat modeling, adept at parsing and categorizing cyber-related information from news articles. At the core of this framework is a fine-tuned BERT model, which we call CANAL - Cyber Activity News Alerting Language Model. We benchmark CANAL against larger, costlier LLMs, including GPT-4, LLaMA, and Zephyr, highlighting their zero to few-shot learning in cyber news classification. We introduce the Cyber Signal Discovery module, a strategic component designed to efficiently detect emerging cyber signals from news articles.
arXiv Detail & Related papers (2024-05-10T18:57:35Z)
Crimson: Empowering Strategic Reasoning in Cybersecurity through Large Language Models [4.960466107669653]
We introduce Crimson, a system that enhances the strategic reasoning capabilities of Large Language Models (LLMs) within the realm of cybersecurity. By correlating CVEs with MITRE ATT&CK techniques, Crimson advances threat anticipation and strategic defense efforts.
arXiv Detail & Related papers (2024-03-01T08:43:43Z)
Data Poisoning for In-context Learning [49.77204165250528]
In-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks. This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks. We introduce ICLPoison, a specialized attacking framework conceived to exploit the learning mechanisms of ICL.
arXiv Detail & Related papers (2024-02-03T14:20:20Z)
Baseline Defenses for Adversarial Attacks Against Aligned Language Models [109.75753454188705]
Recent work shows that text moderations can produce jailbreaking prompts that bypass defenses. We look at three types of defenses: detection (perplexity based), input preprocessing (paraphrase and retokenization), and adversarial training. We find that the weakness of existing discretes for text, combined with the relatively high costs of optimization, makes standard adaptive attacks more challenging for LLMs.
arXiv Detail & Related papers (2023-09-01T17:59:44Z)
MF-CLIP: Leveraging CLIP as Surrogate Models for No-box Adversarial Attacks [65.86360607693457]
No-box attacks, where adversaries have no prior knowledge, remain relatively underexplored despite its practical relevance. This work presents a systematic investigation into leveraging large-scale Vision-Language Models (VLMs) as surrogate models for executing no-box attacks. Our theoretical and empirical analyses reveal a key limitation in the execution of no-box attacks stemming from insufficient discriminative capabilities for direct application of vanilla CLIP as a surrogate model. We propose MF-CLIP: a novel framework that enhances CLIP's effectiveness as a surrogate model through margin-aware feature space optimization.
arXiv Detail & Related papers (2023-07-13T08:10:48Z)
Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers. We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions. We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
Federated Learning with Unreliable Clients: Performance Analysis and Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients. However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training. We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.