XGBD: Explanation-Guided Graph Backdoor Detection
- URL: http://arxiv.org/abs/2308.04406v1
- Date: Tue, 8 Aug 2023 17:10:23 GMT
- Title: XGBD: Explanation-Guided Graph Backdoor Detection
- Authors: Zihan Guan, Mengnan Du, Ninghao Liu
- Abstract summary: Backdoor attacks pose a significant security risk to graph learning models.
We propose an explanation-guided backdoor detection method to take advantage of the topological information.
- Score: 21.918945251903523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor attacks pose a significant security risk to graph learning models.
Backdoors can be embedded into the target model by inserting backdoor triggers
into the training dataset, causing the model to make incorrect predictions when
the trigger is present. To counter backdoor attacks, backdoor detection has
been proposed. An emerging detection strategy in the vision and NLP domains is
based on an intriguing phenomenon: when training models on a mixture of
backdoor and clean samples, the loss on backdoor samples drops significantly
faster than on clean samples, allowing backdoor samples to be easily detected
by selecting samples with the lowest loss values. However, the ignorance of
topological feature information on graph data limits its detection
effectiveness when applied directly to the graph domain. To this end, we
propose an explanation-guided backdoor detection method to take advantage of
the topological information. Specifically, we train a helper model on the graph
dataset, feed graph samples into the model, and then adopt explanation methods
to attribute model prediction to an important subgraph. We observe that
backdoor samples have distinct attribution distribution than clean samples, so
the explanatory subgraph could serve as more discriminative features for
detecting backdoor samples. Comprehensive experiments on multiple popular
datasets and attack methods demonstrate the effectiveness and explainability of
our method. Our code is available:
https://github.com/GuanZihan/GNN_backdoor_detection.
Related papers
- Backdoor Defense through Self-Supervised and Generative Learning [0.0]
Training on such data injects a backdoor which causes malicious inference in selected test samples.
This paper explores an approach based on generative modelling of per-class distributions in a self-supervised representation space.
In both cases, we find that per-class generative models allow to detect poisoned data and cleanse the dataset.
arXiv Detail & Related papers (2024-09-02T11:40:01Z) - PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection [57.571451139201855]
Prediction Shift Backdoor Detection (PSBD) is a novel method for identifying backdoor samples in deep neural networks.
PSBD is motivated by an intriguing Prediction Shift (PS) phenomenon, where poisoned models' predictions on clean data often shift away from true labels towards certain other labels.
PSBD identifies backdoor training samples by computing the Prediction Shift Uncertainty (PSU), the variance in probability values when dropout layers are toggled on and off during model inference.
arXiv Detail & Related papers (2024-06-09T15:31:00Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - OCGEC: One-class Graph Embedding Classification for DNN Backdoor Detection [18.11795712499763]
This study proposes a novel one-class classification framework called One-class Graph Embedding Classification (OCGEC)
OCGEC uses GNNs for model-level backdoor detection with only a little amount of clean data.
In comparison to other baselines, it achieves AUC scores of more than 98% on a number of tasks.
arXiv Detail & Related papers (2023-12-04T02:48:40Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Training set cleansing of backdoor poisoning by self-supervised
representation learning [0.0]
A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN)
We show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin.
We propose to use unsupervised representation learning to avoid emphasising backdoor-poisoned training samples and learn a similar feature embedding for samples of the same class.
arXiv Detail & Related papers (2022-10-19T03:29:58Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.