XGBD: Explanation-Guided Graph Backdoor Detection
- URL: http://arxiv.org/abs/2308.04406v1
- Date: Tue, 8 Aug 2023 17:10:23 GMT
- Title: XGBD: Explanation-Guided Graph Backdoor Detection
- Authors: Zihan Guan, Mengnan Du, Ninghao Liu
- Abstract summary: Backdoor attacks pose a significant security risk to graph learning models.
We propose an explanation-guided backdoor detection method to take advantage of the topological information.
- Score: 21.918945251903523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor attacks pose a significant security risk to graph learning models.
Backdoors can be embedded into the target model by inserting backdoor triggers
into the training dataset, causing the model to make incorrect predictions when
the trigger is present. To counter backdoor attacks, backdoor detection has
been proposed. An emerging detection strategy in the vision and NLP domains is
based on an intriguing phenomenon: when training models on a mixture of
backdoor and clean samples, the loss on backdoor samples drops significantly
faster than on clean samples, allowing backdoor samples to be easily detected
by selecting samples with the lowest loss values. However, the ignorance of
topological feature information on graph data limits its detection
effectiveness when applied directly to the graph domain. To this end, we
propose an explanation-guided backdoor detection method to take advantage of
the topological information. Specifically, we train a helper model on the graph
dataset, feed graph samples into the model, and then adopt explanation methods
to attribute model prediction to an important subgraph. We observe that
backdoor samples have distinct attribution distribution than clean samples, so
the explanatory subgraph could serve as more discriminative features for
detecting backdoor samples. Comprehensive experiments on multiple popular
datasets and attack methods demonstrate the effectiveness and explainability of
our method. Our code is available:
https://github.com/GuanZihan/GNN_backdoor_detection.
Related papers
- PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection [57.571451139201855]
Prediction Shift Backdoor Detection (PSBD) is a novel method for identifying backdoor samples in deep neural networks.
PSBD is motivated by an intriguing Prediction Shift (PS) phenomenon, where poisoned models' predictions on clean data often shift away from true labels towards certain other labels.
PSBD identifies backdoor training samples by computing the Prediction Shift Uncertainty (PSU), the variance in probability values when dropout layers are toggled on and off during model inference.
arXiv Detail & Related papers (2024-06-09T15:31:00Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection
on Open-Set Classification Tasks [51.78558228584093]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that backdoors can be detected even when both models are backdoored.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - Model X-ray:Detect Backdoored Models via Decision Boundary [66.41173675107886]
Deep neural networks (DNNs) have revolutionized various industries, leading to the rise of Machine Learning as a Service (ML)
DNNs are susceptible to backdoor attacks, which pose significant risks to their applications.
We propose Model X-ray, a novel backdoor detection approach for ML through the analysis of decision boundaries.
arXiv Detail & Related papers (2024-02-27T12:42:07Z) - OCGEC: One-class Graph Embedding Classification for DNN Backdoor Detection [18.11795712499763]
This study proposes a novel one-class classification framework called One-class Graph Embedding Classification (OCGEC)
OCGEC uses GNNs for model-level backdoor detection with only a little amount of clean data.
In comparison to other baselines, it achieves AUC scores of more than 98% on a number of tasks.
arXiv Detail & Related papers (2023-12-04T02:48:40Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Training set cleansing of backdoor poisoning by self-supervised
representation learning [0.0]
A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN)
We show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin.
We propose to use unsupervised representation learning to avoid emphasising backdoor-poisoned training samples and learn a similar feature embedding for samples of the same class.
arXiv Detail & Related papers (2022-10-19T03:29:58Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.