Vignat: Vulnerability identification by learning code semantics via
graph attention networks
- URL: http://arxiv.org/abs/2310.20067v1
- Date: Mon, 30 Oct 2023 22:31:38 GMT
- Title: Vignat: Vulnerability identification by learning code semantics via
graph attention networks
- Authors: Shuo Liu and Gail Kaiser
- Abstract summary: We propose textitVignat, a novel attention-based framework for identifying vulnerabilities by learning graph-level semantic representations of code.
We represent codes with code property graphs (CPGs) in fine grain and use graph attention networks (GATs) for vulnerability detection.
- Score: 6.433019933439612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vulnerability identification is crucial to protect software systems from
attacks for cyber-security. However, huge projects have more than millions of
lines of code, and the complex dependencies make it hard to carry out
traditional static and dynamic methods. Furthermore, the semantic structure of
various types of vulnerabilities differs greatly and may occur simultaneously,
making general rule-based methods difficult to extend. In this paper, we
propose \textit{Vignat}, a novel attention-based framework for identifying
vulnerabilities by learning graph-level semantic representations of code. We
represent codes with code property graphs (CPGs) in fine grain and use graph
attention networks (GATs) for vulnerability detection. The results show that
Vignat is able to achieve $57.38\%$ accuracy on reliable datasets derived from
popular C libraries. Furthermore, the interpretability of our GATs provides
valuable insights into vulnerability patterns.
Related papers
- Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation [29.72520866016839]
Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks.
Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task.
FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability.
FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities.
arXiv Detail & Related papers (2024-04-15T09:10:52Z) - The Vulnerability Is in the Details: Locating Fine-grained Information of Vulnerable Code Identified by Graph-based Detectors [33.395068754566935]
VULEXPLAINER is a tool for locating vulnerability-critical code lines from coarse-level vulnerable code snippets.
It can flag the vulnerability-triggering code statements with an accuracy of around 90% against eight common C/C++ vulnerabilities.
arXiv Detail & Related papers (2024-01-05T10:15:04Z) - Sequential Graph Neural Networks for Source Code Vulnerability
Identification [5.582101184758527]
We present a properly curated C/C++ source code vulnerability dataset to aid in developing models.
We also propose a learning framework based on graph neural networks, denoted SEquential Graph Neural Network (SEGNN) for learning a large number of code semantic representations.
Our evaluations on two datasets and four baseline methods in a graph classification setting demonstrate state-of-the-art results.
arXiv Detail & Related papers (2023-05-23T17:25:51Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Multi-context Attention Fusion Neural Network for Software Vulnerability
Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently.
The model builds an accurate understanding of code semantics with a lot less learnable parameters.
The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Learn to Propagate Reliably on Noisy Affinity Graphs [69.97364913330989]
Recent works have shown that exploiting unlabeled data through label propagation can substantially reduce the labeling cost.
How to propagate labels reliably, especially on a dataset with unknown outliers, remains an open question.
We propose a new framework that allows labels to be propagated reliably on large-scale real-world data.
arXiv Detail & Related papers (2020-07-17T07:55:59Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z) - Learning to map source code to software vulnerability using
code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective.
We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.