DSHGT: Dual-Supervisors Heterogeneous Graph Transformer -- A pioneer study of using heterogeneous graph learning for detecting software vulnerabilities
- URL: http://arxiv.org/abs/2306.01376v3
- Date: Thu, 6 Jun 2024 02:06:55 GMT
- Title: DSHGT: Dual-Supervisors Heterogeneous Graph Transformer -- A pioneer study of using heterogeneous graph learning for detecting software vulnerabilities
- Authors: Tiehua Zhang, Rui Xu, Jianping Zhang, Yuze Liu, Xin Chen, Jun Yin, Xi Zheng,
- Abstract summary: Vulnerability detection is a critical problem in software security and attracts growing attention both from academia and industry.
Recent advances in deep learning, especially Graph Neural Networks (GNN), have uncovered the feasibility of automatic detection of a wide range of software vulnerabilities.
In this work, we are one of the first to explore heterogeneous graph representation in the form of Code Property Graph.
- Score: 12.460745260973837
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vulnerability detection is a critical problem in software security and attracts growing attention both from academia and industry. Traditionally, software security is safeguarded by designated rule-based detectors that heavily rely on empirical expertise, requiring tremendous effort from software experts to generate rule repositories for large code corpus. Recent advances in deep learning, especially Graph Neural Networks (GNN), have uncovered the feasibility of automatic detection of a wide range of software vulnerabilities. However, prior learning-based works only break programs down into a sequence of word tokens for extracting contextual features of codes, or apply GNN largely on homogeneous graph representation (e.g., AST) without discerning complex types of underlying program entities (e.g., methods, variables). In this work, we are one of the first to explore heterogeneous graph representation in the form of Code Property Graph and adapt a well-known heterogeneous graph network with a dual-supervisor structure for the corresponding graph learning task. Using the prototype built, we have conducted extensive experiments on both synthetic datasets and real-world projects. Compared with the state-of-the-art baselines, the results demonstrate promising effectiveness in this research direction in terms of vulnerability detection performance (average F1 improvements over 10\% in real-world projects) and transferability from C/C++ to other programming languages (average F1 improvements over 11%).
Related papers
- KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph Data [13.510494408303536]
KnowGraph integrates domain knowledge with data-driven learning for enhanced graph-based anomaly detection.
Tests show KnowGraph consistently outperforms state-of-the-art baselines in both transductive and inductive settings.
Results highlight the potential of integrating domain knowledge into data-driven models for high-stakes, graph-based security applications.
arXiv Detail & Related papers (2024-10-10T21:53:33Z) - Dynamic Neural Control Flow Execution: An Agent-Based Deep Equilibrium Approach for Binary Vulnerability Detection [4.629503670145618]
Software vulnerabilities are a challenge in cybersecurity.
DeepEXE is an agent-based implicit neural network that mimics the execution path of a program.
We show that DeepEXE is an accurate and efficient method and outperforms the state-of-the-art vulnerability detection methods.
arXiv Detail & Related papers (2024-04-03T22:07:50Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - Learning Strong Graph Neural Networks with Weak Information [64.64996100343602]
We develop a principled approach to the problem of graph learning with weak information (GLWI)
We propose D$2$PT, a dual-channel GNN framework that performs long-range information propagation on the input graph with incomplete structure, but also on a global graph that encodes global semantic similarities.
arXiv Detail & Related papers (2023-05-29T04:51:09Z) - Sequential Graph Neural Networks for Source Code Vulnerability
Identification [5.582101184758527]
We present a properly curated C/C++ source code vulnerability dataset to aid in developing models.
We also propose a learning framework based on graph neural networks, denoted SEquential Graph Neural Network (SEGNN) for learning a large number of code semantic representations.
Our evaluations on two datasets and four baseline methods in a graph classification setting demonstrate state-of-the-art results.
arXiv Detail & Related papers (2023-05-23T17:25:51Z) - Benchmarking Node Outlier Detection on Graphs [90.29966986023403]
Graph outlier detection is an emerging but crucial machine learning task with numerous applications.
We present the first comprehensive unsupervised node outlier detection benchmark for graphs called UNOD.
arXiv Detail & Related papers (2022-06-21T01:46:38Z) - ReGVD: Revisiting Graph Neural Networks for Vulnerability Detection [20.65271290295621]
We propose ReGVD, a graph network-based model for vulnerability detection.
In particular, ReGVD views a given source code as a flat sequence of tokens.
We obtain the highest accuracy on the real-world benchmark dataset from CodeXGLUE for vulnerability detection.
arXiv Detail & Related papers (2021-10-14T12:44:38Z) - Deep Fraud Detection on Non-attributed Graph [61.636677596161235]
Graph Neural Networks (GNNs) have shown solid performance on fraud detection.
labeled data is scarce in large-scale industrial problems, especially for fraud detection.
We propose a novel graph pre-training strategy to leverage more unlabeled data.
arXiv Detail & Related papers (2021-10-04T03:42:09Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Graph Representation Learning via Graphical Mutual Information
Maximization [86.32278001019854]
We propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations.
We develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder.
arXiv Detail & Related papers (2020-02-04T08:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.