Universal Soldier: Using Universal Adversarial Perturbations for
Detecting Backdoor Attacks
- URL: http://arxiv.org/abs/2302.00747v3
- Date: Thu, 24 Aug 2023 13:27:08 GMT
- Title: Universal Soldier: Using Universal Adversarial Perturbations for
Detecting Backdoor Attacks
- Authors: Xiaoyun Xu, Oguzhan Ersoy, Stjepan Picek
- Abstract summary: A deep learning model may be poisoned by training with backdoored data or by modifying inner network parameters.
It is difficult to distinguish between clean and backdoored models without prior knowledge of the trigger.
We propose a novel method called Universal Soldier for Backdoor detection (USB) and reverse engineering potential backdoor triggers via UAPs.
- Score: 15.917794562400449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models achieve excellent performance in numerous machine
learning tasks. Yet, they suffer from security-related issues such as
adversarial examples and poisoning (backdoor) attacks. A deep learning model
may be poisoned by training with backdoored data or by modifying inner network
parameters. Then, a backdoored model performs as expected when receiving a
clean input, but it misclassifies when receiving a backdoored input stamped
with a pre-designed pattern called "trigger". Unfortunately, it is difficult to
distinguish between clean and backdoored models without prior knowledge of the
trigger. This paper proposes a backdoor detection method by utilizing a special
type of adversarial attack, universal adversarial perturbation (UAP), and its
similarities with a backdoor trigger. We observe an intuitive phenomenon: UAPs
generated from backdoored models need fewer perturbations to mislead the model
than UAPs from clean models. UAPs of backdoored models tend to exploit the
shortcut from all classes to the target class, built by the backdoor trigger.
We propose a novel method called Universal Soldier for Backdoor detection (USB)
and reverse engineering potential backdoor triggers via UAPs. Experiments on
345 models trained on several datasets show that USB effectively detects the
injected backdoor and provides comparable or better results than
state-of-the-art methods.
Related papers
- Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend.
EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance.
We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z) - PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models [5.957580737396457]
Diffusion models (DMs) are advanced deep learning models that achieved state-of-the-art capability on a wide range of generative tasks.
Recent studies have shown their vulnerability regarding backdoor attacks, in which backdoored DMs consistently generate a designated result called backdoor target.
We introduce PureDiffusion, a novel backdoor defense framework that can efficiently detect backdoor attacks by inverting backdoor triggers embedded in DMs.
arXiv Detail & Related papers (2024-09-20T23:19:26Z) - Towards Unified Robustness Against Both Backdoor and Adversarial Attacks [31.846262387360767]
Deep Neural Networks (DNNs) are known to be vulnerable to both backdoor and adversarial attacks.
This paper reveals that there is an intriguing connection between backdoor and adversarial attacks.
A novel Progressive Unified Defense algorithm is proposed to defend against backdoor and adversarial attacks simultaneously.
arXiv Detail & Related papers (2024-05-28T07:50:00Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input
Detection [42.021282816470794]
We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs)
Our defense falls within the category of post-development defenses that operate independently of how the model was generated.
We show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference.
arXiv Detail & Related papers (2023-08-23T21:47:06Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - BAAAN: Backdoor Attacks Against Autoencoder and GAN-Based Machine
Learning Models [21.06679566096713]
We explore one of the most severe attacks against machine learning models, namely the backdoor attack, against both autoencoders and GANs.
The backdoor attack is a training time attack where the adversary implements a hidden backdoor in the target model that can only be activated by a secret trigger.
We extend the applicability of backdoor attacks to autoencoders and GAN-based models.
arXiv Detail & Related papers (2020-10-06T20:26:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.