Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks
Trained from Scratch
- URL: http://arxiv.org/abs/2106.08970v1
- Date: Wed, 16 Jun 2021 17:09:55 GMT
- Title: Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks
Trained from Scratch
- Authors: Hossein Souri, Micah Goldblum, Liam Fowl, Rama Chellappa, Tom
Goldstein
- Abstract summary: Backdoor attackers tamper with training data to embed a vulnerability in models that are trained on that data.
This vulnerability is then activated at inference time by placing a "trigger" into the model's input.
We develop a new hidden trigger attack, Sleeper Agent, which employs gradient matching, data selection, and target model re-training during the crafting process.
- Score: 99.90716010490625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the curation of data for machine learning becomes increasingly automated,
dataset tampering is a mounting threat. Backdoor attackers tamper with training
data to embed a vulnerability in models that are trained on that data. This
vulnerability is then activated at inference time by placing a "trigger" into
the model's input. Typical backdoor attacks insert the trigger directly into
the training data, although the presence of such an attack may be visible upon
inspection. In contrast, the Hidden Trigger Backdoor Attack achieves poisoning
without placing a trigger into the training data at all. However, this hidden
trigger attack is ineffective at poisoning neural networks trained from
scratch. We develop a new hidden trigger attack, Sleeper Agent, which employs
gradient matching, data selection, and target model re-training during the
crafting process. Sleeper Agent is the first hidden trigger backdoor attack to
be effective against neural networks trained from scratch. We demonstrate its
effectiveness on ImageNet and in black-box settings. Our implementation code
can be found at https://github.com/hsouri/Sleeper-Agent.
Related papers
- Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective.
This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class.
Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z) - Can You Hear It? Backdoor Attacks via Ultrasonic Triggers [31.147899305987934]
In this work, we explore the option of backdoor attacks to automatic speech recognition systems where we inject inaudible triggers.
Our results indicate that less than 1% of poisoned data is sufficient to deploy a backdoor attack and reach a 100% attack success rate.
arXiv Detail & Related papers (2021-07-30T12:08:16Z) - Backdoor Attack in the Physical World [49.64799477792172]
Backdoor attack intends to inject hidden backdoor into the deep neural networks (DNNs)
Most existing backdoor attacks adopted the setting of static trigger, $i.e.,$ triggers across the training and testing images.
We demonstrate that this attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2021-04-06T08:37:33Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Input-Aware Dynamic Backdoor Attack [9.945411554349276]
In recent years, neural backdoor attack has been considered to be a potential security threat to deep learning systems.
Current backdoor techniques rely on uniform trigger patterns, which are easily detected and mitigated by current defense methods.
We propose a novel backdoor attack technique in which the triggers vary from input to input.
arXiv Detail & Related papers (2020-10-16T03:57:12Z) - Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural
Networks [22.28270345106827]
Current state-of-the-art backdoor attacks require the adversary to modify the input, usually by adding a trigger to it, for the target model to activate the backdoor.
This added trigger not only increases the difficulty of launching the backdoor attack in the physical world, but also can be easily detected by multiple defense mechanisms.
We present the first triggerless backdoor attack against deep neural networks, where the adversary does not need to modify the input for triggering the backdoor.
arXiv Detail & Related papers (2020-10-07T09:01:39Z) - Rethinking the Trigger of Backdoor Attack [83.98031510668619]
Currently, most of existing backdoor attacks adopted the setting of emphstatic trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area.
We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2020-04-09T17:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.