Scalable Defect Detection via Traversal on Code Graph
- URL: http://arxiv.org/abs/2406.08098v1
- Date: Wed, 12 Jun 2024 11:24:52 GMT
- Title: Scalable Defect Detection via Traversal on Code Graph
- Authors: Zhengyao Liu, Xitong Zhong, Xingjing Deng, Shuo Hong, Xiang Gao, Hailong Sun,
- Abstract summary: We introduce QVoG, a graph-based static analysis platform for detecting defects and vulnerabilities.
It employs a compressed CPG representation to maintain a reasonable graph size, thereby enhancing the overall query efficiency.
For projects consisting of 1,000,000+ lines of code, QVoG can complete analysis in approximately 15 minutes, as opposed to 19 minutes with CodeQL.
- Score: 10.860910384163892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting defects and vulnerabilities in the early stage has long been a challenge in software engineering. Static analysis, a technique that inspects code without execution, has emerged as a key strategy to address this challenge. Among recent advancements, the use of graph-based representations, particularly Code Property Graph (CPG), has gained traction due to its comprehensive depiction of code structure and semantics. Despite the progress, existing graph-based analysis tools still face performance and scalability issues. The main bottleneck lies in the size and complexity of CPG, which makes analyzing large codebases inefficient and memory-consuming. Also, query rules used by the current tools can be over-specific. Hence, we introduce QVoG, a graph-based static analysis platform for detecting defects and vulnerabilities. It employs a compressed CPG representation to maintain a reasonable graph size, thereby enhancing the overall query efficiency. Based on the CPG, it also offers a declarative query language to simplify the queries. Furthermore, it takes a step forward to integrate machine learning to enhance the generality of vulnerability detection. For projects consisting of 1,000,000+ lines of code, QVoG can complete analysis in approximately 15 minutes, as opposed to 19 minutes with CodeQL.
Related papers
- BugSweeper: Function-Level Detection of Smart Contract Vulnerabilities Using Graph Neural Networks [3.9933521189187693]
We introduce BugSweeper, an end-to-end deep learning framework that detects vulnerabilities directly from the source code without manual engineering.<n>BugSweeper represents each Solidity function as a Function-Level Abstract Syntax Graph (FLAG), a novel graph that combines its Abstract Syntax Tree (AST) with enriched control-flow and data-flow semantics.<n>Our two-stage Graph Neural Network (GNN) filters noise from the syntax graphs, while the second-stage GNN conducts high-level reasoning to detect diverse vulnerabilities.
arXiv Detail & Related papers (2025-12-10T07:30:03Z) - Robust Graph Condensation via Classification Complexity Mitigation [61.22258715077984]
Graph condensation is an intrinsic-dimension-reducing process, synthesizing a condensed graph with lower classification complexity.<n>We introduce three graph data manifold learning modules that guide the condensed graph to lie within a smooth, low-dimensional manifold.<n>Experiments demonstrate the robustness of ModelName across diverse attack scenarios.
arXiv Detail & Related papers (2025-10-30T12:55:21Z) - ProGQL: A Provenance Graph Query System for Cyber Attack Investigation [6.954627558521413]
Provenance analysis (PA) has emerged as an important solution for cyber attack investigation.<n>Existing PA techniques are inflexible and non-extensible, making it difficult to incorporate analyst expertise.<n>We propose the ProGQL framework, which provides a domain-specific graph search language with a well-engineered query engine.
arXiv Detail & Related papers (2025-10-25T18:53:49Z) - Enrich-on-Graph: Query-Graph Alignment for Complex Reasoning with LLM Enriching [61.824094419641575]
Large Language Models (LLMs) struggle with hallucinations and factual errors in knowledge-intensive scenarios like knowledge graph question answering (KGQA)<n>We attribute this to the semantic gap between structured knowledge graphs (KGs) and unstructured queries, caused by inherent differences in their focuses and structures.<n>Existing methods usually employ resource-intensive, non-scalable reasoning on vanilla KGs, but overlook this gap.<n>We propose a flexible framework, Enrich-on-Graph (EoG), which leverages LLMs' prior knowledge to enrich KGs, bridge the semantic gap between graphs and queries.
arXiv Detail & Related papers (2025-09-25T06:48:52Z) - Zero-shot Graph Reasoning via Retrieval Augmented Framework with LLMs [15.558119182035995]
We propose a new, training-free method, Graph Reasoning via Retrieval Augmented Framework (GRRAF)<n> GRRAF harnesses retrieval-augmented generation (RAG) alongside the code-generation capabilities of large language models (LLMs) to address a wide range of graph reasoning tasks.<n> Experimental evaluations on the GraphInstruct dataset reveal that GRRAF achieves 100% accuracy on most graph reasoning tasks.
arXiv Detail & Related papers (2025-09-16T06:58:58Z) - LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models [2.891351178680099]
This paper presents a novel framework integrating Code Property Graphs (CPG) with Large Language Models (LLM) for robust vulnerability detection.<n>Our approach's ability to provide a more concise and accurate representation of code snippets enables the analysis of larger code segments.<n> Empirical evaluation demonstrates LLMxCPG's effectiveness across verified datasets, achieving 15-40% improvements in F1-score over state-of-the-art baselines.
arXiv Detail & Related papers (2025-07-22T13:36:33Z) - Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning [62.640169289390535]
SPLIT-RAG is a multi-agent RAG framework that addresses the limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval.<n>The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG.<n>The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types.<n>A hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications.
arXiv Detail & Related papers (2025-05-20T06:44:34Z) - RGL: A Graph-Centric, Modular Framework for Efficient Retrieval-Augmented Generation on Graphs [58.10503898336799]
We introduce the RAG-on-Graphs Library (RGL), a modular framework that seamlessly integrates the complete RAG pipeline.
RGL addresses key challenges by supporting a variety of graph formats and integrating optimized implementations for essential components.
Our evaluations demonstrate that RGL not only accelerates the prototyping process but also enhances the performance and applicability of graph-based RAG systems.
arXiv Detail & Related papers (2025-03-25T03:21:48Z) - Accelerating Hardware Verification with Graph Models [0.43988112145759295]
We introduce GraphFuzz, a graph-based hardware fuzzer for gate-level netlist verification.
In this approach, hardware designs are modeled as graph nodes, with gate behaviors encoded as features.
By leveraging graph learning algorithms, GraphFuzz efficiently detects hardware vulnerabilities by analyzing node patterns.
arXiv Detail & Related papers (2024-12-17T23:14:46Z) - Instance-Aware Graph Prompt Learning [71.26108600288308]
We introduce Instance-Aware Graph Prompt Learning (IA-GPL) in this paper.
The process involves generating intermediate prompts for each instance using a lightweight architecture.
Experiments conducted on multiple datasets and settings showcase the superior performance of IA-GPL compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-11-26T18:38:38Z) - Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection [50.7263393517558]
We introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR)
Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design.
The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks.
arXiv Detail & Related papers (2024-03-23T15:49:13Z) - CONCORD: Towards a DSL for Configurable Graph Code Representation [3.756550107432323]
We introduce CONCORD, a domain-specific language to build customizable graph representations.
We demonstrate its effectiveness in code smell detection as an illustrative use case.
ConCORD will help researchers create and experiment with customizable graph-based code representations.
arXiv Detail & Related papers (2024-01-31T16:16:48Z) - It Is Time To Steer: A Scalable Framework for Analysis-driven Attack Graph Generation [50.06412862964449]
Attack Graph (AG) represents the best-suited solution to support cyber risk assessment for multi-step attacks on computer networks.
Current solutions propose to address the generation problem from the algorithmic perspective and postulate the analysis only after the generation is complete.
This paper rethinks the classic AG analysis through a novel workflow in which the analyst can query the system anytime.
arXiv Detail & Related papers (2023-12-27T10:44:58Z) - Towards Self-Interpretable Graph-Level Anomaly Detection [73.1152604947837]
Graph-level anomaly detection (GLAD) aims to identify graphs that exhibit notable dissimilarity compared to the majority in a collection.
We propose a Self-Interpretable Graph aNomaly dETection model ( SIGNET) that detects anomalous graphs as well as generates informative explanations simultaneously.
arXiv Detail & Related papers (2023-10-25T10:10:07Z) - A Graph Encoder-Decoder Network for Unsupervised Anomaly Detection [7.070726553564701]
We propose an unsupervised graph encoder-decoder model to detect abnormal nodes from graphs.
In the encoding stage, we design a novel pooling mechanism, named LCPool, to find a cluster assignment matrix.
In the decoding stage, we propose an unpooling operation, called LCUnpool, to reconstruct both the structure and nodal features of the original graph.
arXiv Detail & Related papers (2023-08-15T13:49:12Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - GraphEye: A Novel Solution for Detecting Vulnerable Functions Based on
Graph Attention Network [8.420666984519826]
We propose a novel solution named GraphEye to identify whether a function of C/C++ code has vulnerabilities.
VecCPG is a vectorization for the code property graph, which is proposed to characterize the key syntax and semantic features of the corresponding source code.
GcGAT is a deep learning model based on the graph attention graph, which is proposed to solve the graph classification problem.
arXiv Detail & Related papers (2022-02-05T07:03:15Z) - Leveraging Structural Properties of Source Code Graphs for Just-In-Time
Bug Prediction [6.467090475885797]
A graph is one of the most commonly used representations for understanding relational data.
In this study, we propose a methodology to utilize the relational properties of source code in the form of a graph.
arXiv Detail & Related papers (2022-01-25T07:20:47Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.