Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection
- URL: http://arxiv.org/abs/2407.16235v1
- Date: Tue, 23 Jul 2024 07:21:14 GMT
- Title: Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection
- Authors: Xin Zhou, Duc-Manh Tran, Thanh Le-Cong, Ting Zhang, Ivana Clairine Irsan, Joshua Sumarlin, Bach Le, David Lo,
- Abstract summary: Static Application Security Testing (SAST) is usually utilized to scan source code for security vulnerabilities.
Deep learning (DL)-based methods have demonstrated their potential in software vulnerability detection.
This paper compares 15 diverse SAST tools with 12 popular or state-of-the-art open-source LLMs in detecting software vulnerabilities.
- Score: 11.13802281700894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software vulnerabilities pose significant security challenges and potential risks to society, necessitating extensive efforts in automated vulnerability detection. There are two popular lines of work to address automated vulnerability detection. On one hand, Static Application Security Testing (SAST) is usually utilized to scan source code for security vulnerabilities, especially in industries. On the other hand, deep learning (DL)-based methods, especially since the introduction of large language models (LLMs), have demonstrated their potential in software vulnerability detection. However, there is no comparative study between SAST tools and LLMs, aiming to determine their effectiveness in vulnerability detection, understand the pros and cons of both SAST and LLMs, and explore the potential combination of these two families of approaches. In this paper, we compared 15 diverse SAST tools with 12 popular or state-of-the-art open-source LLMs in detecting software vulnerabilities from repositories of three popular programming languages: Java, C, and Python. The experimental results showed that SAST tools obtain low vulnerability detection rates with relatively low false positives, while LLMs can detect up 90\% to 100\% of vulnerabilities but suffer from high false positives. By further ensembling the SAST tools and LLMs, the drawbacks of both SAST tools and LLMs can be mitigated to some extent. Our analysis sheds light on both the current progress and future directions for software vulnerability detection.
Related papers
- AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - LSAST -- Enhancing Cybersecurity through LLM-supported Static Application Security Testing [5.644999288757871]
Large Language Models (LLMs) have demonstrated powerful code analysis capabilities, but their static training data and privacy risks limit their effectiveness.
We propose LSAST, a novel approach that integrates LLMs with SAST scanners to enhance vulnerability detection.
We set a new benchmark for static vulnerability analysis, offering a robust, privacy-conscious solution.
arXiv Detail & Related papers (2024-09-24T04:42:43Z) - The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z) - Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability [44.99833362998488]
Large Language Models (LLMs) have shown impressive performance across a wide range of tasks.
LLMs in particular are known to be vulnerable to adversarial attacks, where an imperceptible change to the input can mislead the output of the model.
We propose a method, based on Mechanistic Interpretability (MI) techniques, to guide this process.
arXiv Detail & Related papers (2024-07-29T09:55:34Z) - PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation [18.432274815853116]
PenHeal is a two-stage LLM-based framework designed to autonomously identify and security vulnerabilities.
This paper introduces PenHeal, a two-stage LLM-based framework designed to autonomously identify and security vulnerabilities.
arXiv Detail & Related papers (2024-07-25T05:42:14Z) - Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study [1.03590082373586]
We propose using large language models (LLMs) to assist in finding vulnerabilities in source code.
The aim is to test multiple state-of-the-art LLMs and identify the best prompting strategies.
We find that LLMs can pinpoint many more issues than traditional static analysis tools, outperforming traditional tools in terms of recall and F1 scores.
arXiv Detail & Related papers (2024-05-24T14:59:19Z) - ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages [45.16862486631841]
Tool learning is widely acknowledged as a foundational approach or deploying large language models (LLMs) in real-world scenarios.
To fill this gap, we present *ToolSword*, a comprehensive framework dedicated to investigating safety issues linked to LLMs in tool learning.
arXiv Detail & Related papers (2024-02-16T15:19:46Z) - How Far Have We Gone in Vulnerability Detection Using Large Language
Models [15.09461331135668]
We introduce a comprehensive vulnerability benchmark VulBench.
This benchmark aggregates high-quality data from a wide range of CTF challenges and real-world applications.
We find that several LLMs outperform traditional deep learning approaches in vulnerability detection.
arXiv Detail & Related papers (2023-11-21T08:20:39Z) - Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities [12.82645410161464]
We evaluate the effectiveness of 16 pre-trained Large Language Models on 5,000 code samples from five diverse security datasets.
Overall, LLMs show modest effectiveness in detecting vulnerabilities, obtaining an average accuracy of 62.8% and F1 score of 0.71 across datasets.
We find that advanced prompting strategies that involve step-by-step analysis significantly improve performance of LLMs on real-world datasets in terms of F1 score (by upto 0.18 on average)
arXiv Detail & Related papers (2023-11-16T13:17:20Z) - Identifying the Risks of LM Agents with an LM-Emulated Sandbox [68.26587052548287]
Language Model (LM) agents and tools enable a rich set of capabilities but also amplify potential risks.
High cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks.
We introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios.
arXiv Detail & Related papers (2023-09-25T17:08:02Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.