How Explanations Leak the Decision Logic: Stealing Graph Neural Networks via Explanation Alignment
- URL: http://arxiv.org/abs/2506.03087v1
- Date: Tue, 03 Jun 2025 17:11:05 GMT
- Title: How Explanations Leak the Decision Logic: Stealing Graph Neural Networks via Explanation Alignment
- Authors: Bin Ma, Yuyuan Feng, Minhua Lin, Enyan Dai,
- Abstract summary: Graph Neural Networks (GNNs) have become essential tools for analyzing graph-structured data in domains such as drug discovery and financial analysis.<n>Recent advances in explainable GNNs have addressed this need by revealing important subgraphs that influence predictions.<n>This paper investigates how such explanations potentially leak critical decision logic that can be exploited for model stealing.
- Score: 9.329315232799814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph Neural Networks (GNNs) have become essential tools for analyzing graph-structured data in domains such as drug discovery and financial analysis, leading to growing demands for model transparency. Recent advances in explainable GNNs have addressed this need by revealing important subgraphs that influence predictions, but these explanation mechanisms may inadvertently expose models to security risks. This paper investigates how such explanations potentially leak critical decision logic that can be exploited for model stealing. We propose {\method}, a novel stealing framework that integrates explanation alignment for capturing decision logic with guided data augmentation for efficient training under limited queries, enabling effective replication of both the predictive behavior and underlying reasoning patterns of target models. Experiments on molecular graph datasets demonstrate that our approach shows advantages over conventional methods in model stealing. This work highlights important security considerations for the deployment of explainable GNNs in sensitive domains and suggests the need for protective measures against explanation-based attacks. Our code is available at https://github.com/beanmah/EGSteal.
Related papers
- Extracting Interpretable Logic Rules from Graph Neural Networks [7.262955921646328]
Graph neural networks (GNNs) operate over both input feature spaces and graph structures.<n>We propose a novel framework, LOGI CXGNN, for extracting interpretable logic rules from GNNs.<n> LOGI CXGNN is model-agnostic, efficient, and data-driven, eliminating the need for predefined concepts.
arXiv Detail & Related papers (2025-03-25T09:09:46Z) - Explainable Graph Neural Networks Under Fire [69.15708723429307]
Graph neural networks (GNNs) usually lack interpretability due to their complex computational behavior and the abstract nature of graphs.
Most GNN explanation methods work in a post-hoc manner and provide explanations in the form of a small subset of important edges and/or nodes.
In this paper we demonstrate that these explanations can unfortunately not be trusted, as common GNN explanation methods turn out to be highly susceptible to adversarial perturbations.
arXiv Detail & Related papers (2024-06-10T16:09:16Z) - Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation [41.831831628421675]
Graph Neural Networks (GNNs) have emerged as a prominent code embedding approach for vulnerability detection.
We propose CFExplainer, a novel counterfactual explainer for GNN-based vulnerability detection.
arXiv Detail & Related papers (2024-04-24T06:52:53Z) - Securing Graph Neural Networks in MLaaS: A Comprehensive Realization of Query-based Integrity Verification [68.86863899919358]
We introduce a groundbreaking approach to protect GNN models in Machine Learning from model-centric attacks.
Our approach includes a comprehensive verification schema for GNN's integrity, taking into account both transductive and inductive GNNs.
We propose a query-based verification technique, fortified with innovative node fingerprint generation algorithms.
arXiv Detail & Related papers (2023-12-13T03:17:05Z) - Towards Robust Fidelity for Evaluating Explainability of Graph Neural Networks [32.345435955298825]
Graph Neural Networks (GNNs) are neural models that leverage the dependency structure in graphical data via message passing among the graph nodes.
A main challenge in studying GNN explainability is to provide fidelity measures that evaluate the performance of these explanation functions.
This paper studies this foundational challenge, spotlighting the inherent limitations of prevailing fidelity metrics.
arXiv Detail & Related papers (2023-10-03T06:25:14Z) - Interpreting GNN-based IDS Detections Using Provenance Graph Structural Features [15.256262257064982]
We introduce PROVEXPLAINER, a framework offering instance-level security-aware explanations using an interpretable surrogate model.<n>On malware and APT datasets, PROVEXPLAINER achieves up to 29%/27%/25% higher fidelity+, precision and recall, and 12% lower fidelity- respectively.
arXiv Detail & Related papers (2023-06-01T17:36:24Z) - Model Inversion Attacks against Graph Neural Networks [65.35955643325038]
We study model inversion attacks against Graph Neural Networks (GNNs)
In this paper, we present GraphMI to infer the private training graph data.
Our experimental results show that such defenses are not sufficiently effective and call for more advanced defenses against privacy attacks.
arXiv Detail & Related papers (2022-09-16T09:13:43Z) - Deconfounding to Explanation Evaluation in Graph Neural Networks [136.73451468551656]
We argue that a distribution shift exists between the full graph and the subgraph, causing the out-of-distribution problem.
We propose Deconfounded Subgraph Evaluation (DSE) which assesses the causal effect of an explanatory subgraph on the model prediction.
arXiv Detail & Related papers (2022-01-21T18:05:00Z) - Jointly Attacking Graph Neural Network and its Explanations [50.231829335996814]
Graph Neural Networks (GNNs) have boosted the performance for many graph-related tasks.
Recent studies have shown that GNNs are highly vulnerable to adversarial attacks, where adversaries can mislead the GNNs' prediction by modifying graphs.
We propose a novel attack framework (GEAttack) which can attack both a GNN model and its explanations by simultaneously exploiting their vulnerabilities.
arXiv Detail & Related papers (2021-08-07T07:44:33Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.