Expert-in-the-Loop Systems with Cross-Domain and In-Domain Few-Shot Learning for Software Vulnerability Detection
- URL: http://arxiv.org/abs/2506.10104v1
- Date: Wed, 11 Jun 2025 18:43:51 GMT
- Title: Expert-in-the-Loop Systems with Cross-Domain and In-Domain Few-Shot Learning for Software Vulnerability Detection
- Authors: David Farr, Kevin Talty, Alexandra Farr, John Stockdale, Iain Cruickshank, Jevin West,
- Abstract summary: This study explores the use of Large Language Models (LLMs) in software vulnerability assessment by simulating the identification of Python code with known Common Weaknessions (CWEs)<n>Our results indicate that while zero-shot prompting performs poorly, few-shot prompting significantly enhances classification performance.<n> challenges such as model reliability, interpretability, and adversarial robustness remain critical areas for future research.
- Score: 38.083049237330826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As cyber threats become more sophisticated, rapid and accurate vulnerability detection is essential for maintaining secure systems. This study explores the use of Large Language Models (LLMs) in software vulnerability assessment by simulating the identification of Python code with known Common Weakness Enumerations (CWEs), comparing zero-shot, few-shot cross-domain, and few-shot in-domain prompting strategies. Our results indicate that while zero-shot prompting performs poorly, few-shot prompting significantly enhances classification performance, particularly when integrated with confidence-based routing strategies that improve efficiency by directing human experts to cases where model uncertainty is high, optimizing the balance between automation and expert oversight. We find that LLMs can effectively generalize across vulnerability categories with minimal examples, suggesting their potential as scalable, adaptable cybersecurity tools in simulated environments. However, challenges such as model reliability, interpretability, and adversarial robustness remain critical areas for future research. By integrating AI-driven approaches with expert-in-the-loop (EITL) decision-making, this work highlights a pathway toward more efficient and responsive cybersecurity workflows. Our findings provide a foundation for deploying AI-assisted vulnerability detection systems in both real and simulated environments that enhance operational resilience while reducing the burden on human analysts.
Related papers
- Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents [57.49020237126194]
Large language models (LLMs) have shown promise in assisting cybersecurity tasks, yet existing approaches struggle with automatic vulnerability discovery and exploitation.<n>We propose Co-RedTeam, a security-aware multi-agent framework designed to mirror real-world red-teaming.<n>Co-RedTeam decomposes vulnerability analysis into coordinated discovery and exploitation stages, enabling agents to plan, execute, validate, and refine actions.
arXiv Detail & Related papers (2026-02-02T14:38:45Z) - ARTIS: Agentic Risk-Aware Test-Time Scaling via Iterative Simulation [72.78362530982109]
ARTIS, Agentic Risk-Aware Test-Time Scaling via Iterative Simulation, is a framework that decouples exploration from commitment.<n>We show that naive LLM-based simulators struggle to capture rare but high-impact failure modes.<n>We introduce a risk-aware tool simulator that emphasizes fidelity on failure-inducing actions.
arXiv Detail & Related papers (2026-02-02T06:33:22Z) - Multi-Agent Collaborative Intrusion Detection for Low-Altitude Economy IoT: An LLM-Enhanced Agentic AI Framework [60.72591149679355]
The rapid expansion of low-altitude economy Internet of Things (LAE-IoT) networks has created unprecedented security challenges.<n>Traditional intrusion detection systems fail to tackle the unique characteristics of aerial IoT environments.<n>We introduce a large language model (LLM)-enabled agentic AI framework for enhancing intrusion detection in LAE-IoT networks.
arXiv Detail & Related papers (2026-01-25T12:47:25Z) - AI Agentic Vulnerability Injection And Transformation with Optimized Reasoning [2.918225266151982]
We present AVIATOR, the first AI-agentic vulnerability injection workflow.<n>It automatically injects realistic, category-specific vulnerabilities for high-fidelity, diverse, large-scale vulnerability dataset generation.<n>It combines semantic analysis, injection synthesis enhanced with LoRA-based fine-tuning and Retrieval-Augmented Generation, as well as post-injection validation via static analysis and LLM-based discriminators.
arXiv Detail & Related papers (2025-08-28T14:59:39Z) - Advancing Autonomous Incident Response: Leveraging LLMs and Cyber Threat Intelligence [3.2284427438223013]
Security teams are overwhelmed by alert fatigue, high false-positive rates, and the vast volume of unstructured Cyber Threat Intelligence (CTI) documents.<n>We introduce a novel Retrieval-Augmented Generation (RAG)-based framework that leverages Large Language Models (LLMs) to automate and enhance IR.<n>Our approach introduces a hybrid retrieval mechanism that combines NLP-based similarity searches within a CTI vector database with standardized queries to external CTI platforms.
arXiv Detail & Related papers (2025-08-14T14:20:34Z) - White-Basilisk: A Hybrid Model for Code Vulnerability Detection [50.49233187721795]
We introduce White-Basilisk, a novel approach to vulnerability detection that demonstrates superior performance.<n>White-Basilisk achieves results in vulnerability detection tasks with a parameter count of only 200M.<n>This research establishes new benchmarks in code security and provides empirical evidence that compact, efficiently designed models can outperform larger counterparts in specialized tasks.
arXiv Detail & Related papers (2025-07-11T12:39:25Z) - LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection [0.0]
Large Language Models (LLMs) for vulnerability detection presents unique challenges.<n>Previous attempts employing machine learning models for vulnerability detection have proven ineffective.<n>We propose a robust AI-driven approach focused on mitigating these limitations.
arXiv Detail & Related papers (2025-04-25T15:30:40Z) - Beyond the Surface: An NLP-based Methodology to Automatically Estimate CVE Relevance for CAPEC Attack Patterns [42.63501759921809]
We propose a methodology leveraging Natural Language Processing (NLP) to associate Common Vulnerabilities and Exposure (CAPEC) vulnerabilities with Common Attack Patternion and Classification (CAPEC) attack patterns.<n> Experimental evaluations demonstrate superior performance compared to state-of-the-art models.
arXiv Detail & Related papers (2025-01-13T08:39:52Z) - Bringing Order Amidst Chaos: On the Role of Artificial Intelligence in Secure Software Engineering [0.0]
The ever-evolving technological landscape offers both opportunities and threats, creating a dynamic space where chaos and order compete.<n>Secure software engineering (SSE) must continuously address vulnerabilities that endanger software systems.<n>This thesis seeks to bring order to the chaos in SSE by addressing domain-specific differences that impact AI accuracy.
arXiv Detail & Related papers (2025-01-09T11:38:58Z) - In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models [104.94706600050557]
Text-to-image (T2I) models have shown remarkable progress, but their potential to generate harmful content remains a critical concern in the ML community.<n>We propose ICER, a novel red-teaming framework that generates interpretable and semantic meaningful problematic prompts.<n>Our work provides crucial insights for developing more robust safety mechanisms in T2I systems.
arXiv Detail & Related papers (2024-11-25T04:17:24Z) - BreachSeek: A Multi-Agent Automated Penetration Tester [0.0]
BreachSeek is an AI-driven multi-agent software platform that identifies and exploits vulnerabilities without human intervention.
In preliminary evaluations, BreachSeek successfully exploited vulnerabilities in exploitable machines within local networks.
Future developments aim to expand its capabilities, positioning it as an indispensable tool for cybersecurity professionals.
arXiv Detail & Related papers (2024-08-31T19:15:38Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation [18.432274815853116]
PenHeal is a two-stage LLM-based framework designed to autonomously identify and security vulnerabilities.
This paper introduces PenHeal, a two-stage LLM-based framework designed to autonomously identify and security vulnerabilities.
arXiv Detail & Related papers (2024-07-25T05:42:14Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs.
Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.