Scalable Defect Detection via Traversal on Code Graph
- URL: http://arxiv.org/abs/2406.08098v1
- Date: Wed, 12 Jun 2024 11:24:52 GMT
- Title: Scalable Defect Detection via Traversal on Code Graph
- Authors: Zhengyao Liu, Xitong Zhong, Xingjing Deng, Shuo Hong, Xiang Gao, Hailong Sun,
- Abstract summary: We introduce QVoG, a graph-based static analysis platform for detecting defects and vulnerabilities.
It employs a compressed CPG representation to maintain a reasonable graph size, thereby enhancing the overall query efficiency.
For projects consisting of 1,000,000+ lines of code, QVoG can complete analysis in approximately 15 minutes, as opposed to 19 minutes with CodeQL.
- Score: 10.860910384163892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting defects and vulnerabilities in the early stage has long been a challenge in software engineering. Static analysis, a technique that inspects code without execution, has emerged as a key strategy to address this challenge. Among recent advancements, the use of graph-based representations, particularly Code Property Graph (CPG), has gained traction due to its comprehensive depiction of code structure and semantics. Despite the progress, existing graph-based analysis tools still face performance and scalability issues. The main bottleneck lies in the size and complexity of CPG, which makes analyzing large codebases inefficient and memory-consuming. Also, query rules used by the current tools can be over-specific. Hence, we introduce QVoG, a graph-based static analysis platform for detecting defects and vulnerabilities. It employs a compressed CPG representation to maintain a reasonable graph size, thereby enhancing the overall query efficiency. Based on the CPG, it also offers a declarative query language to simplify the queries. Furthermore, it takes a step forward to integrate machine learning to enhance the generality of vulnerability detection. For projects consisting of 1,000,000+ lines of code, QVoG can complete analysis in approximately 15 minutes, as opposed to 19 minutes with CodeQL.
Related papers
- Instance-Aware Graph Prompt Learning [71.26108600288308]
We introduce Instance-Aware Graph Prompt Learning (IA-GPL) in this paper.
The process involves generating intermediate prompts for each instance using a lightweight architecture.
Experiments conducted on multiple datasets and settings showcase the superior performance of IA-GPL compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-11-26T18:38:38Z) - Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection [50.7263393517558]
We introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR)
Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design.
The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks.
arXiv Detail & Related papers (2024-03-23T15:49:13Z) - CONCORD: Towards a DSL for Configurable Graph Code Representation [3.756550107432323]
We introduce CONCORD, a domain-specific language to build customizable graph representations.
We demonstrate its effectiveness in code smell detection as an illustrative use case.
ConCORD will help researchers create and experiment with customizable graph-based code representations.
arXiv Detail & Related papers (2024-01-31T16:16:48Z) - It Is Time To Steer: A Scalable Framework for Analysis-driven Attack Graph Generation [50.06412862964449]
Attack Graph (AG) represents the best-suited solution to support cyber risk assessment for multi-step attacks on computer networks.
Current solutions propose to address the generation problem from the algorithmic perspective and postulate the analysis only after the generation is complete.
This paper rethinks the classic AG analysis through a novel workflow in which the analyst can query the system anytime.
arXiv Detail & Related papers (2023-12-27T10:44:58Z) - Towards Self-Interpretable Graph-Level Anomaly Detection [73.1152604947837]
Graph-level anomaly detection (GLAD) aims to identify graphs that exhibit notable dissimilarity compared to the majority in a collection.
We propose a Self-Interpretable Graph aNomaly dETection model ( SIGNET) that detects anomalous graphs as well as generates informative explanations simultaneously.
arXiv Detail & Related papers (2023-10-25T10:10:07Z) - A Graph Encoder-Decoder Network for Unsupervised Anomaly Detection [7.070726553564701]
We propose an unsupervised graph encoder-decoder model to detect abnormal nodes from graphs.
In the encoding stage, we design a novel pooling mechanism, named LCPool, to find a cluster assignment matrix.
In the decoding stage, we propose an unpooling operation, called LCUnpool, to reconstruct both the structure and nodal features of the original graph.
arXiv Detail & Related papers (2023-08-15T13:49:12Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - GraphEye: A Novel Solution for Detecting Vulnerable Functions Based on
Graph Attention Network [8.420666984519826]
We propose a novel solution named GraphEye to identify whether a function of C/C++ code has vulnerabilities.
VecCPG is a vectorization for the code property graph, which is proposed to characterize the key syntax and semantic features of the corresponding source code.
GcGAT is a deep learning model based on the graph attention graph, which is proposed to solve the graph classification problem.
arXiv Detail & Related papers (2022-02-05T07:03:15Z) - Leveraging Structural Properties of Source Code Graphs for Just-In-Time
Bug Prediction [6.467090475885797]
A graph is one of the most commonly used representations for understanding relational data.
In this study, we propose a methodology to utilize the relational properties of source code in the form of a graph.
arXiv Detail & Related papers (2022-01-25T07:20:47Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.