Related papers: SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection

SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection

URL: http://arxiv.org/abs/2409.00882v1
Date: Mon, 2 Sep 2024 00:49:02 GMT
Title: SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection
Authors: Van Nguyen, Surya Nepal, Tingmin Wu, Xingliang Yuan, Carsten Rudolph,
Abstract summary: Software vulnerabilities (SVs) have emerged as a prevalent and critical concern for safety-critical security systems. We propose a novel framework that enhances the capability of large language models to learn and utilize semantic and syntactic relationships from source code data for SVD.
Score: 23.7268575752712
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software vulnerabilities (SVs) have emerged as a prevalent and critical concern for safety-critical security systems. This has spurred significant advancements in utilizing AI-based methods, including machine learning and deep learning, for software vulnerability detection (SVD). While AI-based methods have shown promising performance in SVD, their effectiveness on real-world, complex, and diverse source code datasets remains limited in practice. To tackle this challenge, in this paper, we propose a novel framework that enhances the capability of large language models to learn and utilize semantic and syntactic relationships from source code data for SVD. As a result, our approach can enable the acquisition of fundamental knowledge from source code data while adeptly utilizing crucial relationships, i.e., semantic and syntactic associations, to effectively address the software vulnerability detection (SVD) problem. The rigorous and extensive experimental results on three real-world challenging datasets (i.e., ReVeal, D2A, and Devign) demonstrate the superiority of our approach over the effective and state-of-the-art baselines. In summary, on average, our SAFE approach achieves higher performances from 4.79% to 9.15% for F1-measure and from 16.93% to 21.70% for Recall compared to the baselines across all datasets used.

Related papers

ReVul-CoT: Towards Effective Software Vulnerability Assessment with Retrieval-Augmented Generation and Chain-of-Thought Prompting [9.735224996021591]
We propose a novel framework that integrates Retrieval-Augmented Generation (RAG) with Chain-of-Thought (COT) prompting.<n>In ReVul-CoT, the RAG module dynamically retrieves contextually relevant information from a constructed local knowledge base.<n>Building on DeepSeek-V3.1, CoT prompting guides the LLM to perform step-by-step reasoning over exploitability, impact scope, and related factors.
arXiv Detail & Related papers (2025-11-21T08:01:49Z)
Ensembling Large Language Models for Code Vulnerability Detection: An Empirical Evaluation [69.8237598448941]
This study investigates the potential of ensemble learning to enhance the performance of Large Language Models (LLMs) in source code vulnerability detection.<n>We propose Dynamic Gated Stacking (DGS), a Stacking variant tailored for vulnerability detection.
arXiv Detail & Related papers (2025-09-16T03:48:22Z)
Improving vulnerability type prediction and line-level detection via adversarial training-based data augmentation and multi-task learning [10.375389754684905]
We propose a unified approach that integrates Embedding-Layer Driven Adversarial Training (EDAT) with Multi-task Learning (MTL)<n>Our proposed approach outperforms state-of-the-art baselines on both Vulnerability Type Prediction (VTP) and Line-level Vulnerability Detection (LVD) tasks.
arXiv Detail & Related papers (2025-06-30T05:47:09Z)
Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data [22.557961978833386]
We propose a novel framework for large language models (LLMs) that excels at mining vulnerability patterns.<n>Specifically, we construct forward and backward reasoning processes for vulnerability and corresponding fixed code, ensuring the synthesis of high-quality reasoning data.<n>We show that ReVD sets new state-of-the-art for LLM-based software vulnerability detection, e.g., 12.24%-22.77% improvement in the accuracy.
arXiv Detail & Related papers (2025-06-09T03:25:23Z)
Breaking Focus: Contextual Distraction Curse in Large Language Models [68.4534308805202]
We investigate a critical vulnerability in Large Language Models (LLMs) This phenomenon arises when models fail to maintain consistent performance on questions modified with semantically coherent but irrelevant context. We propose an efficient tree-based search methodology to automatically generate CDV examples.
arXiv Detail & Related papers (2025-02-03T18:43:36Z)
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models [97.82118821263825]
Text-to-image (T2I) models have shown remarkable progress, but their potential to generate harmful content remains a critical concern in the ML community. We propose ICER, a novel red-teaming framework that generates interpretable and semantic meaningful problematic prompts. Our work provides crucial insights for developing more robust safety mechanisms in T2I systems.
arXiv Detail & Related papers (2024-11-25T04:17:24Z)
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [94.13848736705575]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms. We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels. Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z)
SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI [58.29510889419971]
Existing benchmarks for evaluating the security risks and capabilities of code-generating large language models (LLMs) face several key limitations.<n>We introduce a general and scalable benchmark construction framework that begins with manually validated, high-quality seed examples and expands them via targeted mutations.<n>Applying this framework to Python, C/C++, and Java, we build SeCodePLT, a dataset of more than 5.9k samples spanning 44 CWE-based risk categories and three security capabilities.
arXiv Detail & Related papers (2024-10-14T21:17:22Z)
Enhancing Pre-Trained Language Models for Vulnerability Detection via Semantic-Preserving Data Augmentation [4.374800396968465]
We propose a data augmentation technique aimed at enhancing the performance of pre-trained language models for vulnerability detection. By incorporating our augmented dataset in fine-tuning a series of representative code pre-trained models, up to 10.1% increase in accuracy and 23.6% increase in F1 can be achieved.
arXiv Detail & Related papers (2024-09-30T21:44:05Z)
M2CVD: Enhancing Vulnerability Semantic through Multi-Model Collaboration for Code Vulnerability Detection [52.4455893010468]
Large Language Models (LLMs) have strong capabilities in code comprehension, but fine-tuning costs and semantic alignment issues limit their project-specific optimization. Code models such CodeBERT are easy to fine-tune, but it is often difficult to learn vulnerability semantics from complex code languages. This paper introduces the Multi-Model Collaborative Vulnerability Detection approach (M2CVD) to improve the detection accuracy of code models.
arXiv Detail & Related papers (2024-06-10T00:05:49Z)
Deep Learning-Based Out-of-distribution Source Code Data Identification: How Far Have We Gone? [23.962076093344166]
We propose an innovative deep learning-based approach addressing the OOD source code data identification problem. Our method is derived from an information-theoretic perspective with the use of innovative cluster-contrastive learning. Our method achieves a significantly higher performance from around 15.27%, 7.39%, and 4.93% on the FPR, AUROC, and AUPR measures, respectively.
arXiv Detail & Related papers (2024-04-09T02:52:55Z)
Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities [21.787125867708962]
Large language models (LLMs) have demonstrated impressive potential in various domains. In this paper, we explore how to leverage LLMs and chain-of-thought (CoT) prompting to address three key software vulnerability analysis tasks. We show substantial superiority of our CoT-inspired prompting over the baselines.
arXiv Detail & Related papers (2024-02-27T05:48:18Z)
A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies. Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance. Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z)
On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model, Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews. We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z)
Cross Project Software Vulnerability Detection via Domain Adaptation and Max-Margin Principle [21.684043656053106]
Software vulnerabilities (SVs) have become a common, serious and crucial concern due to the ubiquity of computer software. We propose a novel end-to-end approach to tackle these two crucial issues. Our method obtains a higher performance on F1-measure, the most important measure in SVD, from 1.83% to 6.25% compared to the second highest method in the used datasets.
arXiv Detail & Related papers (2022-09-19T23:47:22Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.