Related papers: Learning to map source code to software vulnerability using code-as-a-graph

Learning to map source code to software vulnerability using code-as-a-graph

URL: http://arxiv.org/abs/2006.08614v1
Date: Mon, 15 Jun 2020 16:05:27 GMT
Title: Learning to map source code to software vulnerability using code-as-a-graph
Authors: Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Laredo, Alessandro Morari
Abstract summary: We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
Score: 67.62847721118142
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. Specifically, whether signatures of vulnerabilities in source code can be learned from its graph representation, in terms of relationships between nodes and edges. We create a pipeline we call AI4VA, which first encodes a sample source code into a Code Property Graph. The extracted graph is then vectorized in a manner which preserves its semantic information. A Gated Graph Neural Network is then trained using several such graphs to automatically extract templates differentiating the graph of a vulnerable sample from a healthy one. Our model outperforms static analyzers, classic machine learning, as well as CNN and RNN-based deep learning models on two of the three datasets we experiment with. We thus show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches. (Submitted Oct 2019, Paper #28, ICST)

Related papers

Hierarchical Vector Quantized Graph Autoencoder with Annealing-Based Code Selection [13.731120424653705]
The Vector Quantized Variational Autoencoder (VQ-VAE) is a powerful autoencoder extensively used in fields such as computer vision. In this paper, we provide an empirical analysis of vector quantization in the context of graph autoencoders. We identify two key challenges associated with vector quantization when applying in graph data: codebook underutilization and codebook space sparsity.
arXiv Detail & Related papers (2025-04-17T07:43:52Z)
Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs [5.953617559607503]
Vul-LMGNN is a unified model that combines pre-trained code language models with code property graphs. Vul-LMGNN constructs a code property graph that integrates various code attributes into a unified graph structure. To effectively retain dependency information among various attributes, we introduce a gated code Graph Neural Network.
arXiv Detail & Related papers (2024-04-23T03:48:18Z)
Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules [81.05116895430375]
Masked graph modeling excels in the self-supervised representation learning of molecular graphs. We show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning. We propose a novel MGM method SimSGT, featuring a Simple GNN-based Tokenizer (SGT) and an effective decoding strategy.
arXiv Detail & Related papers (2023-10-23T09:40:30Z)
Sequential Graph Neural Networks for Source Code Vulnerability Identification [5.582101184758527]
We present a properly curated C/C++ source code vulnerability dataset to aid in developing models. We also propose a learning framework based on graph neural networks, denoted SEquential Graph Neural Network (SEGNN) for learning a large number of code semantic representations. Our evaluations on two datasets and four baseline methods in a graph classification setting demonstrate state-of-the-art results.
arXiv Detail & Related papers (2023-05-23T17:25:51Z)
ReGVD: Revisiting Graph Neural Networks for Vulnerability Detection [20.65271290295621]
We propose ReGVD, a graph network-based model for vulnerability detection. In particular, ReGVD views a given source code as a flat sequence of tokens. We obtain the highest accuracy on the real-world benchmark dataset from CodeXGLUE for vulnerability detection.
arXiv Detail & Related papers (2021-10-14T12:44:38Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search [15.19181807445119]
We propose a learnable deep Graph for Code Search (called deGraphCS) to transfer source code into variable-based flow graphs. We collect a large-scale dataset from GitHub containing 41,152 code snippets written in C language.
arXiv Detail & Related papers (2021-03-24T06:57:44Z)
Heuristic Semi-Supervised Learning for Graph Generation Inspired by Electoral College [80.67842220664231]
We propose a novel pre-processing technique, namely ELectoral COllege (ELCO), which automatically expands new nodes and edges to refine the label similarity within a dense subgraph. In all setups tested, our method boosts the average score of base models by a large margin of 4.7 points, as well as consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2020-06-10T14:48:48Z)
Auto-decoding Graphs [91.3755431537592]
The generative model is an auto-decoder that learns to synthesize graphs from latent codes. Graphs are synthesized using self-attention modules that are trained to identify likely connectivity patterns.
arXiv Detail & Related papers (2020-06-04T14:23:01Z)
Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description. We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.