"No Matter What You Do": Purifying GNN Models via Backdoor Unlearning
- URL: http://arxiv.org/abs/2410.01272v2
- Date: Wed, 13 Nov 2024 00:19:34 GMT
- Title: "No Matter What You Do": Purifying GNN Models via Backdoor Unlearning
- Authors: Jiale Zhang, Chengcheng Zhu, Bosen Rao, Hao Sui, Xiaobing Sun, Bing Chen, Chunyi Zhou, Shouling Ji,
- Abstract summary: backdoor attacks in GNNs lie in the fact that the attacker modifies a portion of graph data by embedding triggers.
We present GCleaner, the first backdoor mitigation method on GNNs.
GCleaner can reduce the backdoor attack success rate to 10% with only 1% of clean data, and has almost negligible degradation in model performance.
- Score: 33.07926413485209
- License:
- Abstract: Recent studies have exposed that GNNs are vulnerable to several adversarial attacks, among which backdoor attack is one of the toughest. Similar to Deep Neural Networks (DNNs), backdoor attacks in GNNs lie in the fact that the attacker modifies a portion of graph data by embedding triggers and enforces the model to learn the trigger feature during the model training process. Despite the massive prior backdoor defense works on DNNs, defending against backdoor attacks in GNNs is largely unexplored, severely hindering the widespread application of GNNs in real-world tasks. To bridge this gap, we present GCleaner, the first backdoor mitigation method on GNNs. GCleaner can mitigate the presence of the backdoor logic within backdoored GNNs by reversing the backdoor learning procedure, aiming to restore the model performance to a level similar to that is directly trained on the original clean dataset. To achieve this objective, we ask: How to recover universal and hard backdoor triggers in GNNs? How to unlearn the backdoor trigger feature while maintaining the model performance? We conduct the graph trigger recovery via the explanation method to identify optimal trigger locations, facilitating the search of universal and hard backdoor triggers in the feature space of the backdoored model through maximal similarity. Subsequently, we introduce the backdoor unlearning mechanism, which combines knowledge distillation and gradient-based explainable knowledge for fine-grained backdoor erasure. Extensive experimental evaluations on four benchmark datasets demonstrate that GCleaner can reduce the backdoor attack success rate to 10% with only 1% of clean data, and has almost negligible degradation in model performance, which far outperforms the state-of-the-art (SOTA) defense methods.
Related papers
- Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend.
EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance.
We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z) - BeniFul: Backdoor Defense via Middle Feature Analysis for Deep Neural Networks [0.6872939325656702]
We propose an effective and comprehensive backdoor defense method named BeniFul, which consists of two parts: a gray-box backdoor input detection and a white-box backdoor elimination.
Experimental results on CIFAR-10 and Tiny ImageNet against five state-of-the-art attacks demonstrate that our BeniFul exhibits a great defense capability in backdoor input detection and backdoor elimination.
arXiv Detail & Related papers (2024-10-15T13:14:55Z) - Rethinking Pruning for Backdoor Mitigation: An Optimization Perspective [19.564985801521814]
We propose an optimized Neuron Pruning (ONP) method combined with Graph Neural Network (GNN) and Reinforcement Learning (RL) to repair backdoor models.
With a small amount of clean data, ONP can effectively prune the backdoor neurons implanted by a set of backdoor attacks at the cost of negligible performance degradation.
arXiv Detail & Related papers (2024-05-28T01:59:06Z) - Reconstructive Neuron Pruning for Backdoor Defense [96.21882565556072]
We propose a novel defense called emphReconstructive Neuron Pruning (RNP) to expose and prune backdoor neurons.
In RNP, unlearning is operated at the neuron level while recovering is operated at the filter level, forming an asymmetric reconstructive learning procedure.
We show that such an asymmetric process on only a few clean samples can effectively expose and prune the backdoor neurons implanted by a wide range of attacks.
arXiv Detail & Related papers (2023-05-24T08:29:30Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Model-Contrastive Learning for Backdoor Defense [13.781375023320981]
We propose a novel backdoor defense method named MCL based on model-contrastive learning.
MCL is more effective for reducing backdoor threats while maintaining higher accuracy of benign data.
arXiv Detail & Related papers (2022-05-09T16:36:46Z) - Adversarial Neuron Pruning Purifies Backdoored Deep Models [24.002034537777526]
Adrial Neuron Pruning (ANP) effectively removes the injected backdoor without causing obvious performance degradation.
We propose a novel model repairing method, termed Adrial Neuron Pruning (ANP), which prunes some sensitive neurons to purify the injected backdoor.
arXiv Detail & Related papers (2021-10-27T13:41:53Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Defending against Backdoor Attack on Deep Neural Networks [98.45955746226106]
We study the so-called textitbackdoor attack, which injects a backdoor trigger to a small portion of training data.
Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.
arXiv Detail & Related papers (2020-02-26T02:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.