Exploring RAG-based Vulnerability Augmentation with LLMs
- URL: http://arxiv.org/abs/2408.04125v1
- Date: Wed, 7 Aug 2024 23:22:58 GMT
- Title: Exploring RAG-based Vulnerability Augmentation with LLMs
- Authors: Seyed Shayan Daneshvar, Yu Nong, Xu Yang, Shaowei Wang, Haipeng Cai,
- Abstract summary: Large language models (LLMs) are being used for solving various code generation and comprehension tasks.
In this study, we explore three different strategies to augment vulnerabilities, with LLMs, namely Mutation, Injection, and Extension.
Our results show that our injection-based clustering-enhanced RAG method beats the baseline setting (NoAug), Vulgen, and VGX (two SOTA methods), and Random Oversampling (ROS) by 30.80%, 27.48%, 27.93%, and 15.41% in f1-score with 5K generated vulnerable samples.
- Score: 19.45598962972431
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting vulnerabilities is a crucial task for maintaining the integrity, availability, and security of software systems. Utilizing DL-based models for vulnerability detection has become commonplace in recent years. However, such deep learning-based vulnerability detectors (DLVD) suffer from a shortage of sizable datasets to train effectively. Data augmentation can potentially alleviate the shortage of data, but augmenting vulnerable code is challenging and requires designing a generative solution that maintains vulnerability. Hence, the work on generating vulnerable code samples has been limited and previous works have only focused on generating samples that contain single statements or specific types of vulnerabilities. Lately, large language models (LLMs) are being used for solving various code generation and comprehension tasks and have shown inspiring results, especially when fused with retrieval augmented generation (RAG). In this study, we explore three different strategies to augment vulnerabilities both single and multi-statement vulnerabilities, with LLMs, namely Mutation, Injection, and Extension. We conducted an extensive evaluation of our proposed approach on three vulnerability datasets and three DLVD models, using two LLMs. Our results show that our injection-based clustering-enhanced RAG method beats the baseline setting (NoAug), Vulgen, and VGX (two SOTA methods), and Random Oversampling (ROS) by 30.80\%, 27.48\%, 27.93\%, and 15.41\% in f1-score with 5K generated vulnerable samples on average, and 53.84\%, 54.10\%, 69.90\%, and 40.93\% with 15K generated vulnerable samples. Our approach demonstrates its feasibility for large-scale data augmentation by generating 1K samples at as cheap as US$ 1.88.
Related papers
- Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities [63.603861880022954]
We introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability.
Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100% ASR on various open-source LLMs.
It exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3.
arXiv Detail & Related papers (2024-10-24T06:36:12Z) - Uncovering Weaknesses in Neural Code Generation [21.552898575210534]
We assess the quality of generated code using match-based and execution-based metrics, then conduct thematic analysis to develop a taxonomy of nine types of weaknesses.
In the CoNaLa dataset, inaccurate prompts are a notable problem, causing all large models to fail in 26.84% of cases.
Missing pivotal semantics is a pervasive issue across benchmarks, with one or more large models omitting key semantics in 65.78% of CoNaLa tasks.
All models struggle with proper API usage, a challenge amplified by vague or complex prompts.
arXiv Detail & Related papers (2024-07-13T07:31:43Z) - AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses.
Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies.
We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z) - Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation [29.72520866016839]
Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks.
Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task.
FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability.
FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities.
arXiv Detail & Related papers (2024-04-15T09:10:52Z) - Camouflage is all you need: Evaluating and Enhancing Language Model
Robustness Against Camouflage Adversarial Attacks [53.87300498478744]
Adversarial attacks represent a substantial challenge in Natural Language Processing (NLP)
This study undertakes a systematic exploration of this challenge in two distinct phases: vulnerability evaluation and resilience enhancement.
Results suggest a trade-off between performance and robustness, with some models maintaining similar performance while gaining robustness.
arXiv Detail & Related papers (2024-02-15T10:58:22Z) - Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities [12.82645410161464]
We evaluate the effectiveness of 16 pre-trained Large Language Models on 5,000 code samples from five diverse security datasets.
Overall, LLMs show modest effectiveness in detecting vulnerabilities, obtaining an average accuracy of 62.8% and F1 score of 0.71 across datasets.
We find that advanced prompting strategies that involve step-by-step analysis significantly improve performance of LLMs on real-world datasets in terms of F1 score (by upto 0.18 on average)
arXiv Detail & Related papers (2023-11-16T13:17:20Z) - VGX: Large-Scale Sample Generation for Boosting Learning-Based Software
Vulnerability Analyses [30.65722096096949]
This paper proposes VGX, a new technique aimed for large-scale generation of high-quality vulnerability datasets.
VGX materializes vulnerability-injection code editing in identified contexts using patterns of such edits.
For in-the-wild sample production, VGX generated 150,392 vulnerable samples, from which we randomly chose 10% to assess how much these samples help vulnerability detection, localization, and repair.
arXiv Detail & Related papers (2023-10-24T01:05:00Z) - Text generation for dataset augmentation in security classification
tasks [55.70844429868403]
This study evaluates the application of natural language text generators to fill this data gap in multiple security-related text classification tasks.
We find substantial benefits for GPT-3 data augmentation strategies in situations with severe limitations on known positive-class samples.
arXiv Detail & Related papers (2023-10-22T22:25:14Z) - DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based
Vulnerability Detection [29.52887618905746]
This dataset contains 18,945 vulnerable functions spanning 150 CWEs and 330,492 non-vulnerable functions extracted from 7,514 commits.
Our results show that deep learning is still not ready for vulnerability detection, due to high false positive rate, low F1 score, and difficulty of detecting hard CWEs.
We demonstrate that large language models (LLMs) are a promising research direction for ML-based vulnerability detection, outperforming Graph Neural Networks (GNNs) with code-structure features.
arXiv Detail & Related papers (2023-04-01T23:29:14Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.