Related papers: Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

URL: http://arxiv.org/abs/2109.03341v1
Date: Tue, 7 Sep 2021 21:24:36 GMT
Title: Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation
Authors: Yufan Zhuang, Sahil Suneja, Veronika Thost, Giacomo Domeniconi, Alessandro Morari, Jim Laredo
Abstract summary: This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
Score: 57.92972327649165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Identifying vulnerable code is a precautionary measure to counter software security breaches. Tedious expert effort has been spent to build static analyzers, yet insecure patterns are barely fully enumerated. This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program, in order to improve prediction performance. Compared with a generic GNN, our enhancements include a synthesis of multiple representations learned from the several parsed graphs of a program, and a new training loss metric that leverages the fine granularity of labeling. Our model outperforms multiple text, image and graph-based approaches, across two real-world datasets.

Related papers

Enhancing Software Vulnerability Detection Using Code Property Graphs and Convolutional Neural Networks [0.0]
This paper proposes a novel approach to detecting software vulnerabilities using a combination of code property graphs and machine learning techniques. We introduce various neural network models, including convolutional neural networks adapted for graph data, to process these representations. Our contributions include a methodology for transforming software code into code property graphs, the implementation of a convolutional neural network model for graph data, and the creation of a comprehensive dataset for training and evaluation.
arXiv Detail & Related papers (2025-03-23T19:12:07Z)
Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs [5.953617559607503]
Vul-LMGNN is a unified model that combines pre-trained code language models with code property graphs. Vul-LMGNN constructs a code property graph that integrates various code attributes into a unified graph structure. To effectively retain dependency information among various attributes, we introduce a gated code Graph Neural Network.
arXiv Detail & Related papers (2024-04-23T03:48:18Z)
CONCORD: Towards a DSL for Configurable Graph Code Representation [3.756550107432323]
We introduce CONCORD, a domain-specific language to build customizable graph representations. We demonstrate its effectiveness in code smell detection as an illustrative use case. ConCORD will help researchers create and experiment with customizable graph-based code representations.
arXiv Detail & Related papers (2024-01-31T16:16:48Z)
DSHGT: Dual-Supervisors Heterogeneous Graph Transformer -- A pioneer study of using heterogeneous graph learning for detecting software vulnerabilities [12.460745260973837]
Vulnerability detection is a critical problem in software security and attracts growing attention both from academia and industry. Recent advances in deep learning, especially Graph Neural Networks (GNN), have uncovered the feasibility of automatic detection of a wide range of software vulnerabilities. In this work, we are one of the first to explore heterogeneous graph representation in the form of Code Property Graph.
arXiv Detail & Related papers (2023-06-02T08:57:13Z)
GraphMAE: Self-Supervised Masked Graph Autoencoders [52.06140191214428]
We present a masked graph autoencoder GraphMAE that mitigates issues for generative self-supervised graph learning. We conduct extensive experiments on 21 public datasets for three different graph learning tasks. The results manifest that GraphMAE--a simple graph autoencoder with our careful designs--can consistently generate outperformance over both contrastive and generative state-of-the-art baselines.
arXiv Detail & Related papers (2022-05-22T11:57:08Z)
Towards Unsupervised Deep Graph Structure Learning [67.58720734177325]
We propose an unsupervised graph structure learning paradigm, where the learned graph topology is optimized by data itself without any external guidance. Specifically, we generate a learning target from the original data as an "anchor graph", and use a contrastive loss to maximize the agreement between the anchor graph and the learned graph.
arXiv Detail & Related papers (2022-01-17T11:57:29Z)
Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data. We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z)
Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs) GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features. It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
Learning to map source code to software vulnerability using code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.