Can Backdoor Attacks Survive Time-Varying Models?
- URL: http://arxiv.org/abs/2206.04677v1
- Date: Wed, 8 Jun 2022 01:32:49 GMT
- Title: Can Backdoor Attacks Survive Time-Varying Models?
- Authors: Huiying Li, Arjun Nitin Bhagoji, Ben Y. Zhao, Haitao Zheng
- Abstract summary: Backdoors are powerful attacks against deep neural networks (DNNs)
We study the impact of backdoor attacks on a more realistic scenario of time-varying DNN models.
Our results show that one-shot backdoor attacks do not survive past a few model updates.
- Score: 35.836598031681426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoors are powerful attacks against deep neural networks (DNNs). By
poisoning training data, attackers can inject hidden rules (backdoors) into
DNNs, which only activate on inputs containing attack-specific triggers. While
existing work has studied backdoor attacks on a variety of DNN models, they
only consider static models, which remain unchanged after initial deployment.
In this paper, we study the impact of backdoor attacks on a more realistic
scenario of time-varying DNN models, where model weights are updated
periodically to handle drifts in data distribution over time. Specifically, we
empirically quantify the "survivability" of a backdoor against model updates,
and examine how attack parameters, data drift behaviors, and model update
strategies affect backdoor survivability. Our results show that one-shot
backdoor attacks (i.e., only poisoning training data once) do not survive past
a few model updates, even when attackers aggressively increase trigger size and
poison ratio. To stay unaffected by model update, attackers must continuously
introduce corrupted data into the training pipeline. Together, these results
indicate that when models are updated to learn new data, they also "forget"
backdoors as hidden, malicious features. The larger the distribution shift
between old and new training data, the faster backdoors are forgotten.
Leveraging these insights, we apply a smart learning rate scheduler to further
accelerate backdoor forgetting during model updates, which prevents one-shot
backdoors from surviving past a single model update.
Related papers
- Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend.
EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance.
We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z) - Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs [1.8907257686468144]
Bad actors looking to create successful backdoors must design them to avoid activation during training and evaluation.
Current large language models (LLMs) can distinguish past from future events, with probes on model activations achieving 90% accuracy.
We train models with backdoors triggered by a temporal distributional shift; they activate when the model is exposed to news headlines beyond their training cut-off dates.
arXiv Detail & Related papers (2024-07-04T18:24:09Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - PatchBackdoor: Backdoor Attack against Deep Neural Networks without
Model Modification [0.0]
Backdoor attack is a major threat to deep learning systems in safety-critical scenarios.
In this paper, we show that backdoor attacks can be achieved without any model modification.
We implement PatchBackdoor in real-world scenarios and show that the attack is still threatening.
arXiv Detail & Related papers (2023-08-22T23:02:06Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Neurotoxin: Durable Backdoors in Federated Learning [73.82725064553827]
federated learning systems have an inherent vulnerability during their training to adversarial backdoor attacks.
We propose Neurotoxin, a simple one-line modification to existing backdoor attacks that acts by attacking parameters that are changed less in magnitude during training.
arXiv Detail & Related papers (2022-06-12T16:52:52Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Anti-Backdoor Learning: Training Clean Models on Poisoned Data [17.648453598314795]
Backdoor attack has emerged as a major security threat to deep neural networks (DNNs)
We introduce the concept of emphanti-backdoor learning, aiming to train emphclean models given backdoor-poisoned data.
We empirically show that ABL-trained models on backdoor-poisoned data achieve the same performance as they were trained on purely clean data.
arXiv Detail & Related papers (2021-10-22T03:30:48Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks [46.99548490594115]
A backdoor attack installs a backdoor into the victim model by injecting a backdoor pattern into a small proportion of the training data.
We propose reflection backdoor (Refool) to plant reflections as backdoor into a victim model.
We demonstrate on 3 computer vision tasks and 5 datasets that, Refool can attack state-of-the-art DNNs with high success rate.
arXiv Detail & Related papers (2020-07-05T13:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.