LLbezpeky: Leveraging Large Language Models for Vulnerability Detection
- URL: http://arxiv.org/abs/2401.01269v2
- Date: Tue, 13 Feb 2024 17:56:24 GMT
- Title: LLbezpeky: Leveraging Large Language Models for Vulnerability Detection
- Authors: Noble Saji Mathews, Yelizaveta Brus, Yousra Aafer, Meiyappan Nagappan,
Shane McIntosh
- Abstract summary: Large Language Models (LLMs) have shown tremendous potential in understanding semnatics in human as well as programming languages.
We focus on building an AI-driven workflow to assist developers in identifying and rectifying vulnerabilities.
- Score: 10.330063887545398
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite the continued research and progress in building secure systems,
Android applications continue to be ridden with vulnerabilities, necessitating
effective detection methods. Current strategies involving static and dynamic
analysis tools come with limitations like overwhelming number of false
positives and limited scope of analysis which make either difficult to adopt.
Over the past years, machine learning based approaches have been extensively
explored for vulnerability detection, but its real-world applicability is
constrained by data requirements and feature engineering challenges. Large
Language Models (LLMs), with their vast parameters, have shown tremendous
potential in understanding semnatics in human as well as programming languages.
We dive into the efficacy of LLMs for detecting vulnerabilities in the context
of Android security. We focus on building an AI-driven workflow to assist
developers in identifying and rectifying vulnerabilities. Our experiments show
that LLMs outperform our expectations in finding issues within applications
correctly flagging insecure apps in 91.67% of cases in the Ghera benchmark. We
use inferences from our experiments towards building a robust and actionable
vulnerability detection system and demonstrate its effectiveness. Our
experiments also shed light on how different various simple configurations can
affect the True Positive (TP) and False Positive (FP) rates.
Related papers
- Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - Active Learning for Derivative-Based Global Sensitivity Analysis with Gaussian Processes [70.66864668709677]
We consider the problem of active learning for global sensitivity analysis of expensive black-box functions.
Since function evaluations are expensive, we use active learning to prioritize experimental resources where they yield the most value.
We propose novel active learning acquisition functions that directly target key quantities of derivative-based global sensitivity measures.
arXiv Detail & Related papers (2024-07-13T01:41:12Z) - Assessing the Effectiveness of LLMs in Android Application Vulnerability Analysis [0.0]
This study compares the ability of nine large language models (LLMs) to detect Android code vulnerabilities listed in the latest Open Worldwide Application Security Project (OWASP) Mobile Top 10.
Our analysis reveals the strengths and weaknesses of each LLM, identifying important factors that contribute to their performance.
arXiv Detail & Related papers (2024-06-27T05:14:34Z) - Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models [17.96542494363619]
Large language models (LLMs) have shown a remarkable capability in the comprehension of complicated context and content generation.
We propose LLMVulExp, a framework that utilizes LLMs for vulnerability detection and explanation.
We find that LLMVulExp can effectively enable the LLMs to perform vulnerability detection (e.g., over 90% F1 score on SeVC dataset) and explanation.
arXiv Detail & Related papers (2024-06-14T04:01:25Z) - VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models [12.465060623389151]
This study introduces a new benchmark, VulDetectBench, to assess the vulnerability detection capabilities of Large Language Models (LLMs)
The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty.
Our benchmark effectively evaluates the capabilities of various LLMs at different levels in the specific task of vulnerability detection, providing a foundation for future research and improvements in this critical area of code security.
arXiv Detail & Related papers (2024-06-11T13:42:57Z) - Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations [76.19419888353586]
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations.
We present our efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms.
arXiv Detail & Related papers (2024-03-09T21:07:16Z) - Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics [54.57914943017522]
We highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications.
arXiv Detail & Related papers (2024-02-15T22:01:45Z) - How Far Have We Gone in Vulnerability Detection Using Large Language
Models [15.09461331135668]
We introduce a comprehensive vulnerability benchmark VulBench.
This benchmark aggregates high-quality data from a wide range of CTF challenges and real-world applications.
We find that several LLMs outperform traditional deep learning approaches in vulnerability detection.
arXiv Detail & Related papers (2023-11-21T08:20:39Z) - Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities [12.82645410161464]
Large Language Models (LLMs) have demonstrated remarkable performance on code-related tasks.
We evaluate whether pre-trained LLMs can detect security vulnerabilities and address the limitations of existing tools.
arXiv Detail & Related papers (2023-11-16T13:17:20Z) - Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs.
Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.