Related papers: Real-VulLLM: An LLM Based Assessment Framework in the Wild

Real-VulLLM: An LLM Based Assessment Framework in the Wild

URL: http://arxiv.org/abs/2510.04056v1
Date: Sun, 05 Oct 2025 06:34:30 GMT
Title: Real-VulLLM: An LLM Based Assessment Framework in the Wild
Authors: Rijha Safdar, Danyail Mateen, Syed Taha Ali, Wajahat Hussain,
Abstract summary: Large Language Models (LLMs) have demonstrated exceptional progress in software engineering.<n>Their capability for vulnerability detection in the wild scenario and its corresponding reasoning remains underexplored.<n>Our contributions are (i)varied prompt designs for vulnerability detection and its corresponding reasoning in the wild.
Score: 0.7408058999454915
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Artificial Intelligence (AI) and more specifically Large Language Models (LLMs) have demonstrated exceptional progress in multiple areas including software engineering, however, their capability for vulnerability detection in the wild scenario and its corresponding reasoning remains underexplored. Prompting pre-trained LLMs in an effective way offers a computationally effective and scalable solution. Our contributions are (i)varied prompt designs for vulnerability detection and its corresponding reasoning in the wild. (ii)a real-world vector data store constructed from the National Vulnerability Database, that will provide real time context to vulnerability detection framework, and (iii)a scoring measure for combined measurement of accuracy and reasoning quality. Our contribution aims to examine whether LLMs are ready for wild deployment, thus enabling the reliable use of LLMs stronger for the development of secure software's.

Related papers

On the Effectiveness of Instruction-Tuning Local LLMs for Identifying Software Vulnerabilities [0.7136933021609079]
Large Language Models (LLMs) show significant promise in automating software vulnerability analysis.<n>Current approaches in using LLMs to automate vulnerability analysis mostly rely on using online API-based LLM services.<n>This paper addresses these limitations by reformulating the problem as Software Vulnerability Identification (SVI)<n>We show that instruct-tuned local models represent a more effective, secure, and practical approach for leveraging LLMs in real-world vulnerability management.
arXiv Detail & Related papers (2025-12-23T05:30:53Z)
Evaluating LLMs for One-Shot Patching of Real and Artificial Vulnerabilities [2.5190317156807924]
We empirically evaluate the patching effectiveness and complementarity of several prominent Large Language Models (LLMs)<n>Our results reveal that LLMs patch real vulnerabilities more effectively compared to artificial ones.<n>Our analysis reveals significant variability across LLMs in terms of overlapping (multiple LLMs patching the same vulnerabilities) and complementarity.
arXiv Detail & Related papers (2025-11-28T18:03:47Z)
Agentic AI Reasoning for Mobile Edge General Intelligence: Fundamentals, Approaches, and Directions [74.35421055079655]
Large language models (LLMs) have enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities.<n>Mobile Edge General Intelligence (MEGI) brings real-time, privacy-preserving reasoning to the network edge.<n>We propose a joint optimization framework for efficient LLM reasoning deployment in MEGI.
arXiv Detail & Related papers (2025-09-27T10:53:48Z)
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code [49.009041488527544]
A.S.E is a repository-level evaluation benchmark for assessing the security of AI-generated code.<n>Current large language models (LLMs) still struggle with secure coding.<n>A larger reasoning budget does not necessarily lead to better code generation.
arXiv Detail & Related papers (2025-08-25T15:11:11Z)
ROSE: Toward Reality-Oriented Safety Evaluation of Large Language Models [60.28667314609623]
Large Language Models (LLMs) are increasingly deployed as black-box components in real-world applications.<n>We propose Reality-Oriented Safety Evaluation (ROSE), a novel framework that uses multi-objective reinforcement learning to fine-tune an adversarial LLM.
arXiv Detail & Related papers (2025-06-17T10:55:17Z)
LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection [0.0]
Large Language Models (LLMs) for vulnerability detection presents unique challenges.<n>Previous attempts employing machine learning models for vulnerability detection have proven ineffective.<n>We propose a robust AI-driven approach focused on mitigating these limitations.
arXiv Detail & Related papers (2025-04-25T15:30:40Z)
VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching [0.9208007322096533]
Large Language Models (LLMs) have shown promise in tasks like code translation. This paper introduces VulnLLMEval, a framework designed to assess the performance of LLMs in identifying and patching vulnerabilities in C code. Our study includes 307 real-world vulnerabilities extracted from the Linux kernel.
arXiv Detail & Related papers (2024-09-16T22:00:20Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)
Towards Explainable Vulnerability Detection with Large Language Models [14.243344783348398]
Software vulnerabilities pose significant risks to the security and integrity of software systems.<n>The advent of large language models (LLMs) has introduced transformative potential due to their advanced generative capabilities.<n>In this paper, we propose LLMVulExp, an automated framework designed to specialize LLMs for the dual tasks of vulnerability detection and explanation.
arXiv Detail & Related papers (2024-06-14T04:01:25Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models [107.82336341926134]
SALAD-Bench is a safety benchmark specifically designed for evaluating Large Language Models (LLMs) It transcends conventional benchmarks through its large scale, rich diversity, intricate taxonomy spanning three levels, and versatile functionalities.
arXiv Detail & Related papers (2024-02-07T17:33:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.