Code Structure-Aware through Line-level Semantic Learning for Code Vulnerability Detection
- URL: http://arxiv.org/abs/2407.18877v1
- Date: Fri, 26 Jul 2024 17:15:58 GMT
- Title: Code Structure-Aware through Line-level Semantic Learning for Code Vulnerability Detection
- Authors: Ziliang Wang, Ge Li, Jia Li, Yihong Dong, Yingfei Xiong, Zhi Jin,
- Abstract summary: We propose a novel network architecture based on pre-trained code models, which incorporates structural information awareness.
We introduce a new network architecture, the Code Structure-Aware Network through Line-level Semantic Learning (CSLS), which integrates three key components: global vulnerability awareness, line-structural awareness, and sensitive-line awareness.
- Score: 44.29771620061153
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Different from the flow semantics of natural languages, programming languages are inherently rigid in structure and grammar. Existing fine-tuning methodologies for code vulnerability detection generally treat code as long text sequences, stripping away structural elements such as newlines ('/n') and whitespace. However, this approach inadvertently results in the loss of crucial structural information, diminishing the distinct characteristics of code and impairing the accuracy of vulnerability detection. To address these challenges, we propose a novel network architecture method based on pre-trained code models, which incorporates structural information awareness. We propose an enhanced code text processing workflow that retains structural elements prior to modeling. This refinement allows the model to retain and exploit line-level structural information and semantic information during the modeling process. Furthermore, we introduce a new network architecture, the Code Structure-Aware Network through Line-level Semantic Learning (CSLS), which integrates three key components: global vulnerability awareness, line-structural awareness, and sensitive-line awareness. We have conducted comprehensive experiments using vulnerability detection datasets from real-world projects. Extensive experiments were conducted on vulnerability detection datasets derived from real-world projects. The results demonstrate that our new code pre-processing flow significantly improves existing baselines (e.g., a 3\% accuracy improvement on the Devign dataset when applied to popular models such as CoderBert and UniXcoder). The proposed network architecture also demonstrates superior accuracy in detecting vulnerabilities, surpassing newly established benchmarks. These findings underscore the importance of structural information in enhancing the efficacy of code vulnerability detection models.
Related papers
- DFEPT: Data Flow Embedding for Enhancing Pre-Trained Model Based Vulnerability Detection [7.802093464108404]
We propose a data flow embedding technique to enhance the performance of pre-trained models in vulnerability detection tasks.
Specifically, we parse data flow graphs from function-level source code, and use the data type of the variable as the node characteristics of the DFG.
Our research shows that DFEPT can provide effective vulnerability semantic information to pre-trained models, achieving an accuracy of 64.97% on the Devign dataset and an F1-Score of 47.9% on the Reveal dataset.
arXiv Detail & Related papers (2024-10-24T07:05:07Z) - Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation [29.72520866016839]
Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks.
Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task.
FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability.
FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities.
arXiv Detail & Related papers (2024-04-15T09:10:52Z) - Structure-Aware Code Vulnerability Analysis With Graph Neural Networks [0.0]
This study explores the effectiveness of graph neural networks (GNNs) for vulnerability detection in software code.
The primary focus is to evaluate the general applicability of GNNs in identifying vulnerable code segments and distinguishing these from their fixed versions.
Experiments indicate that certain model configurations, such as the pruning of specific graph elements and the exclusion of certain types of code representation, significantly improve performance.
arXiv Detail & Related papers (2023-07-21T09:35:29Z) - DCDetector: An IoT terminal vulnerability mining system based on
distributed deep ensemble learning under source code representation [2.561778620560749]
The goal of the research is to intelligently detect vulnerabilities in source codes of high-level languages such as C/C++.
This enables us to propose a code representation of sensitive sentence-related slices of source code, and to detect vulnerabilities by designing a distributed deep ensemble learning model.
Experiments show that this method can reduce the false positive rate of traditional static analysis and improve the performance and accuracy of machine learning.
arXiv Detail & Related papers (2022-11-29T14:19:14Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Multi-context Attention Fusion Neural Network for Software Vulnerability
Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently.
The model builds an accurate understanding of code semantics with a lot less learnable parameters.
The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Understanding and Diagnosing Vulnerability under Adversarial Attacks [62.661498155101654]
Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks.
We propose a novel interpretability method, InterpretGAN, to generate explanations for features used for classification in latent variables.
We also design the first diagnostic method to quantify the vulnerability contributed by each layer.
arXiv Detail & Related papers (2020-07-17T01:56:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.