Related papers: Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack

Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack

URL: http://arxiv.org/abs/2405.16134v2
Date: Thu, 30 May 2024 04:45:11 GMT
Title: Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack
Authors: Mingli Zhu, Siyuan Liang, Baoyuan Wu,
Abstract summary: We re-investigate the characteristics of backdoored models after defense. We find that the original backdoors still exist in defense models derived from existing post-training defense strategies. We empirically show that these dormant backdoors can be easily re-activated during inference.
Score: 32.74007523929888
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks face persistent challenges in defending against backdoor attacks, leading to an ongoing battle between attacks and defenses. While existing backdoor defense strategies have shown promising performance on reducing attack success rates, can we confidently claim that the backdoor threat has truly been eliminated from the model? To address it, we re-investigate the characteristics of the backdoored models after defense (denoted as defense models). Surprisingly, we find that the original backdoors still exist in defense models derived from existing post-training defense strategies, and the backdoor existence is measured by a novel metric called backdoor existence coefficient. It implies that the backdoors just lie dormant rather than being eliminated. To further verify this finding, we empirically show that these dormant backdoors can be easily re-activated during inference, by manipulating the original trigger with well-designed tiny perturbation using universal adversarial attack. More practically, we extend our backdoor reactivation to black-box scenario, where the defense model can only be queried by the adversary during inference, and develop two effective methods, i.e., query-based and transfer-based backdoor re-activation attacks. The effectiveness of the proposed methods are verified on both image classification and multimodal contrastive learning (i.e., CLIP) tasks. In conclusion, this work uncovers a critical vulnerability that has never been explored in existing defense strategies, emphasizing the urgency of designing more robust and advanced backdoor defense mechanisms in the future.

Related papers

Data Free Backdoor Attacks [83.10379074100453]
DFBA is a retraining-free and data-free backdoor attack without changing the model architecture. We verify that our injected backdoor is provably undetectable and unchosen by various state-of-the-art defenses. Our evaluation on multiple datasets demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses.
arXiv Detail & Related papers (2024-12-09T05:30:25Z)
Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z)
A general approach to enhance the survivability of backdoor attacks by decision path coupling [8.361829415535018]
We propose Venom, the first generic backdoor attack to improve the survivability of existing backdoor attacks against model reconstruction-based defenses. To realize the second task, we propose attention imitation loss to force the decision path of poisoned samples to couple with the crucial decision path of benign samples. Our evaluation on two enhancers and three datasets has demonstrated that Venom significantly improves the survivability of eight state-of-the-art attacks against eight state-of-the-art defenses.
arXiv Detail & Related papers (2024-03-05T13:21:20Z)
BELT: Old-School Backdoor Attacks can Evade the State-of-the-Art Defense with Backdoor Exclusivity Lifting [21.91491621538245]
We propose and investigate a new characteristic of backdoor attacks, namely, backdoor exclusivity. Backdoor exclusivity measures the ability of backdoor triggers to remain effective in the presence of input variation. Our approach substantially enhances the stealthiness of four old-school backdoor attacks, at almost no cost of the attack success rate and normal utility.
arXiv Detail & Related papers (2023-12-08T08:35:16Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them. We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Contributor-Aware Defenses Against Adversarial Backdoor Attacks [2.830541450812474]
adversarial backdoor attacks have demonstrated the capability to perform targeted misclassification of specific examples. We propose a contributor-aware universal defensive framework for learning in the presence of multiple, potentially adversarial data sources. Our empirical studies demonstrate the robustness of the proposed framework against adversarial backdoor attacks from multiple simultaneous adversaries.
arXiv Detail & Related papers (2022-05-28T20:25:34Z)
Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs) Backdoor learning is an emerging and rapidly growing research area. This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
On Certifying Robustness against Backdoor Attacks via Randomized Smoothing [74.79764677396773]
We study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing. Our results show the theoretical feasibility of using randomized smoothing to certify robustness against backdoor attacks. Existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks.
arXiv Detail & Related papers (2020-02-26T19:15:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.