Related papers: Fine-tuning of Large Language Models for Domain-Specific Cybersecurity Knowledge

Fine-tuning of Large Language Models for Domain-Specific Cybersecurity Knowledge

URL: http://arxiv.org/abs/2509.25241v1
Date: Thu, 25 Sep 2025 12:25:11 GMT
Title: Fine-tuning of Large Language Models for Domain-Specific Cybersecurity Knowledge
Authors: Yuan Huang,
Abstract summary: Fine-tuning strategies to embed cybersecurity knowledge into Large Language Models (LLMs)<n>We investigate Supervised Fine-Tuning (SFT), Low-Rank Adaptation (LoRA), and Quantized Low-Rank Adaptation (QLoRA) using a cybersecurity Q&A dataset.<n>Our work highlights the potential of low-rank fine-tuning strategies to bridge the gap between general-purpose LLMs and domain-specific applications.
Score: 3.728154028384911
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in training paradigms for Large Language Models (LLMs) have unlocked their remarkable capabilities in natural language processing and cross-domain generalization. While LLMs excel in tasks like programming and mathematical problem-solving, their zero-shot performance in specialized domains requiring expert knowledge, such as cybersecurity, is often suboptimal. This limitation arises because foundational LLMs are designed for general-purpose applications, constraining their ability to encapsulate domain-specific expertise within their parameter space. To address this, we explore fine-tuning strategies to embed cybersecurity knowledge into LLMs, enhancing their performance in cybersecurity question-answering (Q\&A) tasks while prioritizing computational efficiency. Specifically, we investigate Supervised Fine-Tuning (SFT), Low-Rank Adaptation (LoRA), and Quantized Low-Rank Adaptation (QLoRA) using a cybersecurity Q\&A dataset. Our results demonstrate that these fine-tuning approaches significantly outperform the foundational model in cybersecurity Q\&A tasks. Moreover, LoRA and QLoRA achieve comparable performance to SFT with substantially lower computational costs, offering an efficient pathway for adapting LLMs to specialized domains. Our work highlights the potential of low-rank fine-tuning strategies to bridge the gap between general-purpose LLMs and domain-specific applications.

Related papers

RedSage: A Cybersecurity Generalist LLM [45.91667919408369]
RedSage is an open-source, locally deployable cybersecurity assistant with domain-aware pretraining and post-training.<n>We use a large-scale web filtering and manual collection of high-quality resources, spanning 28.6K documents across frameworks, offensive techniques, and security tools.<n>RedSage is evaluated on established cybersecurity benchmarks (e.g., CTI-Bench, CyberMetric, SECURE) and general LLM benchmarks to assess broader generalization.
arXiv Detail & Related papers (2026-01-29T18:59:57Z)
CTIArena: Benchmarking LLM Knowledge and Reasoning Across Heterogeneous Cyber Threat Intelligence [48.63397742510097]
Cyber threat intelligence (CTI) is central to modern cybersecurity, providing critical insights for detecting and mitigating evolving threats.<n>With the natural language understanding and reasoning capabilities of large language models (LLMs), there is increasing interest in applying them to CTI.<n>We present CTIArena, the first benchmark for evaluating LLM performance on heterogeneous, multi-source CTI.
arXiv Detail & Related papers (2025-10-13T22:10:17Z)
Agent Fine-tuning through Distillation for Domain-specific LLMs in Microdomains [6.323778761045108]
Agentic large language models (LLMs) have become prominent for autonomously interacting with external environments.<n>This paper explores agent fine-tuning for domain adaptation within Hitachi's JP1 microdomain for specialized IT operations.
arXiv Detail & Related papers (2025-10-01T04:04:53Z)
Less Data, More Security: Advancing Cybersecurity LLMs Specialization via Resource-Efficient Domain-Adaptive Continuous Pre-training with Minimal Tokens [1.2116854758481395]
Domain-Adaptive Continuous Pretraining (DAP) is a methodology for enhancing cybersecurity understanding in large language models (LLMs)<n>We adapted three decoder-based architectures using a curated 126-million-word cybersecurity corpus from standards, academic literature, and various other sources.<n>The Llama-3.3-70B-Ins-DAP model achieved state-of-the-art accuracies of 0.718, 0.933, and 0.864, respectively, outperforming specialized models.
arXiv Detail & Related papers (2025-06-30T12:59:29Z)
General-Reasoner: Advancing LLM Reasoning Across All Domains [64.70599911897595]
Reinforcement learning (RL) has recently demonstrated strong potential in enhancing the reasoning capabilities of large language models (LLMs)<n>We propose General-Reasoner, a novel training paradigm designed to enhance LLM reasoning capabilities across diverse domains.<n>We train a series of models and evaluate them on a wide range of datasets covering wide domains like physics, chemistry, finance, electronics etc.
arXiv Detail & Related papers (2025-05-20T17:41:33Z)
The Digital Cybersecurity Expert: How Far Have We Come? [49.89857422097055]
We develop CSEBenchmark, a fine-grained cybersecurity evaluation framework based on 345 knowledge points expected of cybersecurity experts.<n>We evaluate 12 popular large language models (LLMs) on CSEBenchmark and find that even the best-performing model achieves only 85.42% overall accuracy.<n>By identifying and addressing specific knowledge gaps in each LLM, we achieve up to an 84% improvement in correcting previously incorrect predictions.
arXiv Detail & Related papers (2025-04-16T05:36:28Z)
Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents [6.318292471845427]
We develop the queuing fundamentals for large language model (LLM) inference.<n>We prove that a large class of 'work-conserving' scheduling algorithms can achieve maximum throughput.
arXiv Detail & Related papers (2025-04-10T00:12:12Z)
A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness [31.758459020683574]
Small Language Models (SLMs) are increasingly favored for their low inference latency, cost-effectiveness, efficient development, and easy customization and adaptability.<n>These models are particularly well-suited for resource-limited environments and domain knowledge acquisition.<n>We propose defining SLMs by their capability to perform specialized tasks and suitability for resource-constrained settings.
arXiv Detail & Related papers (2024-11-04T04:43:01Z)
Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.<n>We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z)
CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions [0.2999888908665658]
Large Language Models (LLMs) have significantly advanced natural language processing (NLP) capabilities, providing versatile capabilities across various applications. However, their application to complex, domain-specific tasks, such as cyber-security, often faces substantial challenges. In this study, we introduce SecKnowledge and CyberPal.AI to address these challenges and train security-expert LLMs.
arXiv Detail & Related papers (2024-08-17T22:37:39Z)
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks. Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs. We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z)
PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs [49.32067576992511]
Large language models often fall short of the performance achieved by domain-specific state-of-the-art models. One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets. We propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA) Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks.
arXiv Detail & Related papers (2024-02-20T09:02:55Z)
Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z)
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey [100.24095818099522]
Large language models (LLMs) have significantly advanced the field of natural language processing (NLP) They provide a highly useful, task-agnostic foundation for a wide range of applications. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles.
arXiv Detail & Related papers (2023-05-30T03:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.