Efficient Code Analysis via Graph-Guided Large Language Models
- URL: http://arxiv.org/abs/2601.12890v2
- Date: Thu, 22 Jan 2026 02:41:56 GMT
- Title: Efficient Code Analysis via Graph-Guided Large Language Models
- Authors: Hang Gao, Tao Peng, Baoquan Cui, Hong Huang, Fengge Wu, Junsuo Zhao, Jian Zhang,
- Abstract summary: We propose a graph-centric attention acquisition pipeline that enhances Large Language Models' ability to localize malicious behavior.<n>The approach parses a project into a code graph, uses an LLM to encode nodes with semantic and structural signals, and trains a Graph Neural Network (GNN) under sparse supervision.
- Score: 14.569998138597393
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have significantly advanced code analysis tasks, yet they struggle to detect malicious behaviors fragmented across files, whose intricate dependencies easily get lost in the vast amount of benign code. We therefore propose a graph-centric attention acquisition pipeline that enhances LLMs' ability to localize malicious behavior. The approach parses a project into a code graph, uses an LLM to encode nodes with semantic and structural signals, and trains a Graph Neural Network (GNN) under sparse supervision. The GNN performs an initial detection, and by interpreting these predictions, identifies key code sections that are most likely to contain malicious behavior. These influential regions are then used to guide the LLM's attention for in-depth analysis. This strategy significantly reduces interference from irrelevant context while maintaining low annotation costs. Extensive experiments show that the method consistently outperforms existing approaches on multiple public and custom datasets, highlighting its potential for practical deployment in software security scenarios.
Related papers
- ForgeryVCR: Visual-Centric Reasoning via Efficient Forensic Tools in MLLMs for Image Forgery Detection and Localization [62.03035862528452]
ForgeryVCR is a framework that materializes imperceptible traces into explicit visual intermediates via Visual-Centric Reasoning.<n>ForgeryVCR achieves state-of-the-art (SOTA) performance in both detection and localization tasks.
arXiv Detail & Related papers (2026-02-15T11:14:47Z) - Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing [12.835224376066769]
Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their deployment is frequently undermined by undesirable behaviors.<n>We introduce a novel and efficient framework that diagnoses a range of undesirable LLM behaviors by analyzing representation and its gradients.<n>We systematically evaluate our method for tasks that include tracking harmful content, detecting backdoor poisoning, and identifying knowledge contamination.
arXiv Detail & Related papers (2025-09-26T12:07:47Z) - GRIL: Knowledge Graph Retrieval-Integrated Learning with Large Language Models [59.72897499248909]
We propose a novel graph retriever trained end-to-end with Large Language Models (LLMs)<n>Within the extracted subgraph, structural knowledge and semantic features are encoded via soft tokens and the verbalized graph, respectively, which are infused into the LLM together.<n>Our approach consistently achieves state-of-the-art performance, validating the strength of joint graph-LLM optimization for complex reasoning tasks.
arXiv Detail & Related papers (2025-09-20T02:38:00Z) - Explainable Attention-Guided Stacked Graph Neural Networks for Malware Detection [2.6436521007616114]
We propose a novel stacking ensemble framework for graph-based malware detection and explanation.<n>Our framework improves classification performance while providing insightful interpretations of malware behavior.
arXiv Detail & Related papers (2025-08-13T13:33:02Z) - Scalability Matters: Overcoming Challenges in InstructGLM with Similarity-Degree-Based Sampling [1.2805157669888096]
We propose SDM-InstructGLM, a novel instruction-tuned Graph Language Model (InstructGLM) framework that enhances scalability and efficiency without relying on GNNs.<n>Our method introduces a similarity-degree-based biased random walk mechanism, which selectively samples and encodes graph information based on node-feature similarity and degree centrality.<n>Our results demonstrate the feasibility of LLM-only graph processing, enabling scalable and interpretable Graph Language Models (GLMs) optimized through instruction-based fine-tuning.
arXiv Detail & Related papers (2025-05-02T06:08:21Z) - Few-Shot Graph Out-of-Distribution Detection with LLMs [34.42512005781724]
We propose a framework that combines the strengths of large language models (LLMs) and graph neural networks (GNNs) to enhance data efficiency in graph out-of-distribution (OOD) detection.<n>We show that LLM-GOOD significantly reduces human annotation costs and outperforms state-of-the-art baselines in terms of both ID classification accuracy and OOD detection performance.
arXiv Detail & Related papers (2025-03-28T02:37:18Z) - CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection [15.013699967804987]
This paper presents CGP-Tuning, a new code graph-enhanced, structure-aware soft prompt tuning method for vulnerability detection.<n>CGP-Tuning introduces type-aware embeddings to capture the rich semantic information within code graphs, along with an efficient cross-modal alignment module.<n>It is evaluated on the latest DiverseVul dataset and three advanced open-source code LLMs, CodeLlama, CodeGemma, and Qwen2.5-Coder.
arXiv Detail & Related papers (2025-01-08T13:56:17Z) - Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction [41.39277277686706]
Graph Counterfactual Explanation (GCE) has emerged as a promising approach to improve GNN transparency.
We propose a novel GCE method, LLM-GCE, to unleash the power of large language models (LLMs) in explaining GNNs for molecular property prediction.
arXiv Detail & Related papers (2024-10-19T17:34:36Z) - All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks [51.19110891434727]
Large Language Models (LLMs) with pretrained knowledge and powerful semantic comprehension abilities have recently shown a remarkable ability to benefit applications using vision and text data.
E-LLaGNN is a framework with an on-demand LLM service that enriches message passing procedure of graph learning by enhancing a limited fraction of nodes from the graph.
arXiv Detail & Related papers (2024-07-20T22:09:42Z) - Vul-LMGNNs: Fusing language models and online-distilled graph neural networks for code vulnerability detection [5.536252767247838]
We propose Vul-LMGNNs, integrating pre-trained codeLMs with Graph Neural Networks (GNNs) to enable cross-layer propagation of semantic and structural information.<n>Vul-LMGNNs leverage Code Property Graphs (CPGs) to incorporate syntax, control flow, and data dependencies, using gated GNNs for structural extraction.
arXiv Detail & Related papers (2024-04-23T03:48:18Z) - Exploring the Potential of Large Language Models (LLMs) in Learning on
Graphs [59.74814230246034]
Large Language Models (LLMs) have been proven to possess extensive common knowledge and powerful semantic comprehension abilities.
We investigate two possible pipelines: LLMs-as-Enhancers and LLMs-as-Predictors.
arXiv Detail & Related papers (2023-07-07T05:31:31Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.