ExDoS: Expert-Guided Dual-Focus Cross-Modal Distillation for Smart Contract Vulnerability Detection
- URL: http://arxiv.org/abs/2509.10252v1
- Date: Fri, 12 Sep 2025 13:56:56 GMT
- Title: ExDoS: Expert-Guided Dual-Focus Cross-Modal Distillation for Smart Contract Vulnerability Detection
- Authors: Yifan Jia, Ye Tian, Yanbin Wang, Jianguo Sun, Haitao Xu,
- Abstract summary: We propose ExDoS to transfer semantic knowledge from source code to bytecode.<n>To address obscured local signals in graph-level contract embeddings, we propose a Dual-Attention Graph Network.<n> Experiments on real-world contracts demonstrate that our method achieves consistent F1-score improvements.
- Score: 8.236011772496767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of smart contracts has made them a target for attacks, but their closed-source nature often forces vulnerability detection to work on bytecode, which is inherently more challenging than source-code-based analysis. While recent studies try to align source and bytecode embeddings during training to transfer knowledge, current methods rely on graph-level alignment that obscures fine-grained structural and semantic correlations between the two modalities. Moreover, the absence of precise vulnerability patterns and granular annotations in bytecode leads to depriving the model of crucial supervisory signals for learning discriminant features. We propose ExDoS to transfer rich semantic knowledge from source code to bytecode, effectively supplementing the source code prior in practical settings. Specifically, we construct semantic graphs from source code and control-flow graphs from bytecode. To address obscured local signals in graph-level contract embeddings, we propose a Dual-Attention Graph Network introducing a novel node attention aggregation module to enhance local pattern capture in graph embeddings. Furthermore, by summarizing existing source code vulnerability patterns and designing a corresponding set of bytecode-level patterns for each, we construct the first dataset of vulnerability pattern annotations aligned with source code definitions to facilitate fine-grained cross-modal alignment and the capture of function-level vulnerability signals. Finally, we propose a dual-focus objective for our cross-modal distillation framework, comprising: a Global Semantic Distillation Loss for transferring graph-level knowledge and a Local Semantic Distillation Loss enabling expert-guided, fine-grained vulnerability-specific distillation. Experiments on real-world contracts demonstrate that our method achieves consistent F1-score improvements (3\%--6\%) over strong baselines.
Related papers
- BugSweeper: Function-Level Detection of Smart Contract Vulnerabilities Using Graph Neural Networks [3.9933521189187693]
We introduce BugSweeper, an end-to-end deep learning framework that detects vulnerabilities directly from the source code without manual engineering.<n>BugSweeper represents each Solidity function as a Function-Level Abstract Syntax Graph (FLAG), a novel graph that combines its Abstract Syntax Tree (AST) with enriched control-flow and data-flow semantics.<n>Our two-stage Graph Neural Network (GNN) filters noise from the syntax graphs, while the second-stage GNN conducts high-level reasoning to detect diverse vulnerabilities.
arXiv Detail & Related papers (2025-12-10T07:30:03Z) - CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection [15.013699967804987]
This paper presents CGP-Tuning, a new code graph-enhanced, structure-aware soft prompt tuning method for vulnerability detection.<n>CGP-Tuning introduces type-aware embeddings to capture the rich semantic information within code graphs, along with an efficient cross-modal alignment module.<n>It is evaluated on the latest DiverseVul dataset and three advanced open-source code LLMs, CodeLlama, CodeGemma, and Qwen2.5-Coder.
arXiv Detail & Related papers (2025-01-08T13:56:17Z) - Vul-LMGNNs: Fusing language models and online-distilled graph neural networks for code vulnerability detection [5.536252767247838]
We propose Vul-LMGNNs, integrating pre-trained codeLMs with Graph Neural Networks (GNNs) to enable cross-layer propagation of semantic and structural information.<n>Vul-LMGNNs leverage Code Property Graphs (CPGs) to incorporate syntax, control flow, and data dependencies, using gated GNNs for structural extraction.
arXiv Detail & Related papers (2024-04-23T03:48:18Z) - CONVERT:Contrastive Graph Clustering with Reliable Augmentation [110.46658439733106]
We propose a novel CONtrastiVe Graph ClustEring network with Reliable AugmenTation (CONVERT)
In our method, the data augmentations are processed by the proposed reversible perturb-recover network.
To further guarantee the reliability of semantics, a novel semantic loss is presented to constrain the network.
arXiv Detail & Related papers (2023-08-17T13:07:09Z) - An Unbiased Transformer Source Code Learning with Semantic Vulnerability
Graph [3.3598755777055374]
Current vulnerability screening techniques are ineffective at identifying novel vulnerabilities or providing developers with code vulnerability and classification.
To address these issues, we propose a joint multitasked unbiased vulnerability classifier comprising a transformer "RoBERTa" and graph convolution neural network (GCN)
We present a training process utilizing a semantic vulnerability graph (SVG) representation from source code, created by integrating edges from a sequential flow, control flow, and data flow, as well as a novel flow dubbed Poacher Flow (PF)
arXiv Detail & Related papers (2023-04-17T20:54:14Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - ReGVD: Revisiting Graph Neural Networks for Vulnerability Detection [20.65271290295621]
We propose ReGVD, a graph network-based model for vulnerability detection.
In particular, ReGVD views a given source code as a flat sequence of tokens.
We obtain the highest accuracy on the real-world benchmark dataset from CodeXGLUE for vulnerability detection.
arXiv Detail & Related papers (2021-10-14T12:44:38Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Learning to map source code to software vulnerability using
code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective.
We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z) - Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph.
It is updated by decoding in the context of an auto-encoder.
Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.