DECK: Model Hardening for Defending Pervasive Backdoors
- URL: http://arxiv.org/abs/2206.09272v1
- Date: Sat, 18 Jun 2022 19:46:06 GMT
- Title: DECK: Model Hardening for Defending Pervasive Backdoors
- Authors: Guanhong Tao, Yingqi Liu, Siyuan Cheng, Shengwei An, Zhuo Zhang,
Qiuling Xu, Guangyu Shen, Xiangyu Zhang
- Abstract summary: Pervasive backdoors are triggered by dynamic and pervasive input perturbations.
We develop a general pervasive attack based on an encoder-decoder architecture enhanced with a special transformation layer.
Our technique can enlarge class distances by 59.65% on average with less than 1% accuracy degradation and no loss.
- Score: 21.163501644177668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pervasive backdoors are triggered by dynamic and pervasive input
perturbations. They can be intentionally injected by attackers or naturally
exist in normally trained models. They have a different nature from the
traditional static and localized backdoors that can be triggered by perturbing
a small input area with some fixed pattern, e.g., a patch with solid color.
Existing defense techniques are highly effective for traditional backdoors.
However, they may not work well for pervasive backdoors, especially regarding
backdoor removal and model hardening. In this paper, we propose a novel model
hardening technique against pervasive backdoors, including both natural and
injected backdoors. We develop a general pervasive attack based on an
encoder-decoder architecture enhanced with a special transformation layer. The
attack can model a wide range of existing pervasive backdoor attacks and
quantify them by class distances. As such, using the samples derived from our
attack in adversarial training can harden a model against these backdoor
vulnerabilities. Our evaluation on 9 datasets with 15 model structures shows
that our technique can enlarge class distances by 59.65% on average with less
than 1% accuracy degradation and no robustness loss, outperforming five
hardening techniques such as adversarial training, universal adversarial
training, MOTH, etc. It can reduce the attack success rate of six pervasive
backdoor attacks from 99.06% to 1.94%, surpassing seven state-of-the-art
backdoor removal techniques.
Related papers
- TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models.
Our approach injects backdoors by fine-tuning only a very small proportion of model parameters.
We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - PatchBackdoor: Backdoor Attack against Deep Neural Networks without
Model Modification [0.0]
Backdoor attack is a major threat to deep learning systems in safety-critical scenarios.
In this paper, we show that backdoor attacks can be achieved without any model modification.
We implement PatchBackdoor in real-world scenarios and show that the attack is still threatening.
arXiv Detail & Related papers (2023-08-22T23:02:06Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Architectural Backdoors in Neural Networks [27.315196801989032]
We introduce a new class of backdoor attacks that hide inside model architectures.
These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture.
We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch.
arXiv Detail & Related papers (2022-06-15T22:44:03Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Handcrafted Backdoors in Deep Neural Networks [33.21980707457639]
We introduce a handcrafted attack that directly manipulates the parameters of a pre-trained model to inject backdoors.
Our backdoors remain effective across four datasets and four network architectures with a success rate above 96%.
Our results suggest that further research is needed for understanding the complete space of supply-chain backdoor attacks.
arXiv Detail & Related papers (2021-06-08T20:58:23Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.