Backdoor Mitigation by Correcting the Distribution of Neural Activations
- URL: http://arxiv.org/abs/2308.09850v1
- Date: Fri, 18 Aug 2023 22:52:29 GMT
- Title: Backdoor Mitigation by Correcting the Distribution of Neural Activations
- Authors: Xi Li, Zhen Xiang, David J. Miller, George Kesidis
- Abstract summary: Backdoor (Trojan) attacks are an important type of adversarial exploit against deep neural networks (DNNs)
We analyze an important property of backdoor attacks: a successful attack causes an alteration in the distribution of internal layer activations for backdoor-trigger instances.
We propose an efficient and effective method that achieves post-training backdoor mitigation by correcting the distribution alteration.
- Score: 30.554700057079867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor (Trojan) attacks are an important type of adversarial exploit
against deep neural networks (DNNs), wherein a test instance is (mis)classified
to the attacker's target class whenever the attacker's backdoor trigger is
present. In this paper, we reveal and analyze an important property of backdoor
attacks: a successful attack causes an alteration in the distribution of
internal layer activations for backdoor-trigger instances, compared to that for
clean instances. Even more importantly, we find that instances with the
backdoor trigger will be correctly classified to their original source classes
if this distribution alteration is corrected. Based on our observations, we
propose an efficient and effective method that achieves post-training backdoor
mitigation by correcting the distribution alteration using reverse-engineered
triggers. Notably, our method does not change any trainable parameters of the
DNN, but achieves generally better mitigation performance than existing methods
that do require intensive DNN parameter tuning. It also efficiently detects
test instances with the trigger, which may help to catch adversarial entities
in the act of exploiting the backdoor.
Related papers
- DMGNN: Detecting and Mitigating Backdoor Attacks in Graph Neural Networks [30.766013737094532]
We propose DMGNN against out-of-distribution (OOD) and in-distribution (ID) graph backdoor attacks.
DMGNN can easily identify the hidden ID and OOD triggers via predicting label transitions based on counterfactual explanation.
DMGNN far outperforms the state-of-the-art (SOTA) defense methods, reducing the attack success rate to 5% with almost negligible degradation in model performance.
arXiv Detail & Related papers (2024-10-18T01:08:03Z) - Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations [50.1394620328318]
Existing backdoor attacks mainly focus on balanced datasets.
We propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$2$AO)
Our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.
arXiv Detail & Related papers (2024-10-16T18:44:22Z) - Improved Activation Clipping for Universal Backdoor Mitigation and
Test-Time Detection [27.62279831135902]
Deep neural networks are vulnerable toTrojan attacks, where an attacker poisons the training set with backdoor triggers.
Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model.
We devise a new such approach, choosing the activation bounds to explicitly limit classification margins.
arXiv Detail & Related papers (2023-08-08T22:47:39Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Rethinking the Trigger-injecting Position in Graph Backdoor Attack [7.4968235623939155]
Backdoor attacks have been demonstrated as a security threat for machine learning models.
In this paper, we study two trigger-injecting strategies for backdoor attacks on Graph Neural Networks (GNNs)
Our results show that, generally, LIAS performs better, and the differences between the LIAS and MIAS performance can be significant.
arXiv Detail & Related papers (2023-04-05T07:50:05Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Adversarial Fine-tuning for Backdoor Defense: Connect Adversarial
Examples to Triggered Samples [15.57457705138278]
We propose a new Adversarial Fine-Tuning (AFT) approach to erase backdoor triggers.
AFT can effectively erase the backdoor triggers without obvious performance degradation on clean samples.
arXiv Detail & Related papers (2022-02-13T13:41:15Z) - Rethinking the Trigger of Backdoor Attack [83.98031510668619]
Currently, most of existing backdoor attacks adopted the setting of emphstatic trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area.
We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2020-04-09T17:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.