Related papers: Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report

Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report

URL: http://arxiv.org/abs/2504.21039v1
Date: Mon, 28 Apr 2025 08:41:12 GMT
Title: Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report
Authors: Paul Kassianik, Baturay Saglam, Alexander Chen, Blaine Nelson, Anu Vellore, Massimo Aufiero, Fraser Burch, Dhruv Kedia, Avi Zohary, Sajana Weerawardhena, Aman Priyanshu, Adam Swanda, Amy Chang, Hyrum Anderson, Kojin Oshiba, Omar Santos, Yaron Singer, Amin Karbasi,
Abstract summary: We present Foundation-Sec-8B, a cybersecurity-focused large language model (LLMs) built on the Llama 3.1 architecture.<n>We evaluate it across both established and new cybersecurity benchmarks, showing that it matches Llama 3.1-70B and GPT-4o-mini in certain cybersecurity-specific tasks.<n>By releasing our model to the public, we aim to accelerate progress and adoption of AI-driven tools in both public and private cybersecurity contexts.
Score: 50.268821168513654
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As transformer-based large language models (LLMs) increasingly permeate society, they have revolutionized domains such as software engineering, creative writing, and digital arts. However, their adoption in cybersecurity remains limited due to challenges like scarcity of specialized training data and complexity of representing cybersecurity-specific knowledge. To address these gaps, we present Foundation-Sec-8B, a cybersecurity-focused LLM built on the Llama 3.1 architecture and enhanced through continued pretraining on a carefully curated cybersecurity corpus. We evaluate Foundation-Sec-8B across both established and new cybersecurity benchmarks, showing that it matches Llama 3.1-70B and GPT-4o-mini in certain cybersecurity-specific tasks. By releasing our model to the public, we aim to accelerate progress and adoption of AI-driven tools in both public and private cybersecurity contexts.

Related papers

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report [23.285449541240325]
We release Foundation-Sec-8B-Instruct, a model specifically trained for general-purpose cybersecurity dialogue.<n>It combines domain-specific knowledge with instruction-following, conversational capabilities, and alignment with human preferences to produce high-quality, relevant responses.<n> Comprehensive evaluations show that Foundation-Sec-8B-Instruct outperforms Llama 3.1-8B-Instruct on a range of cybersecurity tasks.
arXiv Detail & Related papers (2025-08-01T20:25:57Z)
Less Data, More Security: Advancing Cybersecurity LLMs Specialization via Resource-Efficient Domain-Adaptive Continuous Pre-training with Minimal Tokens [1.2116854758481395]
Domain-Adaptive Continuous Pretraining (DAP) is a methodology for enhancing cybersecurity understanding in large language models (LLMs)<n>We adapted three decoder-based architectures using a curated 126-million-word cybersecurity corpus from standards, academic literature, and various other sources.<n>The Llama-3.3-70B-Ins-DAP model achieved state-of-the-art accuracies of 0.718, 0.933, and 0.864, respectively, outperforming specialized models.
arXiv Detail & Related papers (2025-06-30T12:59:29Z)
CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z)
CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution [22.86304661035188]
Large Language Model (LLM) agents can automate cybersecurity tasks and can adapt to the evolving cybersecurity landscape without re-engineering.<n>But they have two key limitations: accessing latest cybersecurity expertise beyond training data, and integrating new knowledge into complex task planning.<n>We present CRAKEN, a knowledge-based LLM agent framework that improves cybersecurity capability through three core mechanisms.
arXiv Detail & Related papers (2025-05-21T11:01:11Z)
The Digital Cybersecurity Expert: How Far Have We Come? [49.89857422097055]
We develop CSEBenchmark, a fine-grained cybersecurity evaluation framework based on 345 knowledge points expected of cybersecurity experts.<n>We evaluate 12 popular large language models (LLMs) on CSEBenchmark and find that even the best-performing model achieves only 85.42% overall accuracy.<n>By identifying and addressing specific knowledge gaps in each LLM, we achieve up to an 84% improvement in correcting previously incorrect predictions.
arXiv Detail & Related papers (2025-04-16T05:36:28Z)
Position: Mind the Gap-the Growing Disconnect Between Established Vulnerability Disclosure and AI Security [56.219994752894294]
We argue that adapting existing processes for AI security reporting is doomed to fail due to fundamental shortcomings for the distinctive characteristics of AI systems.<n>Based on our proposal to address these shortcomings, we discuss an approach to AI security reporting and how the new AI paradigm, AI agents, will further reinforce the need for specialized AI security incident reporting advancements.
arXiv Detail & Related papers (2024-12-19T13:50:26Z)
Global Challenge for Safe and Secure LLMs Track 1 [57.08717321907755]
The Global Challenge for Safe and Secure Large Language Models (LLMs) is a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks.
arXiv Detail & Related papers (2024-11-21T08:20:31Z)
CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions [0.2999888908665658]
Large Language Models (LLMs) have significantly advanced natural language processing (NLP) capabilities, providing versatile capabilities across various applications. However, their application to complex, domain-specific tasks, such as cyber-security, often faces substantial challenges. In this study, we introduce SecKnowledge and CyberPal.AI to address these challenges and train security-expert LLMs.
arXiv Detail & Related papers (2024-08-17T22:37:39Z)
Generative AI in Cybersecurity: A Comprehensive Review of LLM Applications and Vulnerabilities [1.0974825157329373]
This paper provides a comprehensive review of the future of cybersecurity through Generative AI and Large Language Models (LLMs)<n>We explore LLM applications across various domains, including hardware design security, intrusion detection, software engineering, design verification, cyber threat intelligence, malware detection, and phishing detection.<n>We present an overview of LLM evolution and its current state, focusing on advancements in models such as GPT-4, GPT-3.5, Mixtral-8x7B, BERT, Falcon2, and LLaMA.
arXiv Detail & Related papers (2024-05-21T13:02:27Z)
SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence [27.550484938124193]
This paper introduces a framework to benchmark, elicit, and improve cybersecurity incident analysis and response abilities. We create a high-quality bilingual instruction corpus by crawling cybersecurity raw text from cybersecurity websites. The instruction dataset SEvenLLM-Instruct is used to train cybersecurity LLMs with the multi-task learning objective.
arXiv Detail & Related papers (2024-05-06T13:17:43Z)
The Security and Privacy of Mobile Edge Computing: An Artificial Intelligence Perspective [64.36680481458868]
Mobile Edge Computing (MEC) is a new computing paradigm that enables cloud computing and information technology (IT) services to be delivered at the network's edge. This paper provides a survey of security and privacy in MEC from the perspective of Artificial Intelligence (AI) We focus on new security and privacy issues, as well as potential solutions from the viewpoints of AI.
arXiv Detail & Related papers (2024-01-03T07:47:22Z)
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models [41.068780235482514]
This paper presents CyberSecEval, a comprehensive benchmark developed to help bolster the cybersecurity of Large Language Models (LLMs) employed as coding assistants. CyberSecEval provides a thorough evaluation of LLMs in two crucial security domains: their propensity to generate insecure code and their level of compliance when asked to assist in cyberattacks.
arXiv Detail & Related papers (2023-12-07T22:07:54Z)
Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society. Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities. With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z)
Recognizing and Extracting Cybersecurtity-relevant Entities from Text [1.7499351967216343]
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks. CTI is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG)
arXiv Detail & Related papers (2022-08-02T18:44:06Z)
Proceedings of the Artificial Intelligence for Cyber Security (AICS) Workshop at AAAI 2022 [55.573187938617636]
The workshop will focus on the application of AI to problems in cyber security. Cyber systems generate large volumes of data, utilizing this effectively is beyond human capabilities.
arXiv Detail & Related papers (2022-02-28T18:27:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.