On Certifying Robustness against Backdoor Attacks via Randomized
Smoothing
- URL: http://arxiv.org/abs/2002.11750v4
- Date: Mon, 20 Jul 2020 16:15:42 GMT
- Title: On Certifying Robustness against Backdoor Attacks via Randomized
Smoothing
- Authors: Binghui Wang, Xiaoyu Cao, Jinyuan jia, and Neil Zhenqiang Gong
- Abstract summary: We study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing.
Our results show the theoretical feasibility of using randomized smoothing to certify robustness against backdoor attacks.
Existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks.
- Score: 74.79764677396773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor attack is a severe security threat to deep neural networks (DNNs).
We envision that, like adversarial examples, there will be a cat-and-mouse game
for backdoor attacks, i.e., new empirical defenses are developed to defend
against backdoor attacks but they are soon broken by strong adaptive backdoor
attacks. To prevent such cat-and-mouse game, we take the first step towards
certified defenses against backdoor attacks. Specifically, in this work, we
study the feasibility and effectiveness of certifying robustness against
backdoor attacks using a recent technique called randomized smoothing.
Randomized smoothing was originally developed to certify robustness against
adversarial examples. We generalize randomized smoothing to defend against
backdoor attacks. Our results show the theoretical feasibility of using
randomized smoothing to certify robustness against backdoor attacks. However,
we also find that existing randomized smoothing methods have limited
effectiveness at defending against backdoor attacks, which highlight the needs
of new theory and methods to certify robustness against backdoor attacks.
Related papers
- Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack [32.74007523929888]
We re-investigate the characteristics of backdoored models after defense.
We find that the original backdoors still exist in defense models derived from existing post-training defense strategies.
We empirically show that these dormant backdoors can be easily re-activated during inference.
arXiv Detail & Related papers (2024-05-25T08:57:30Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Beating Backdoor Attack at Its Own Game [10.131734154410763]
Deep neural networks (DNNs) are vulnerable to backdoor attack.
Existing defense methods have greatly reduced attack success rate.
We propose a highly effective framework which injects non-adversarial backdoors targeting poisoned samples.
arXiv Detail & Related papers (2023-07-28T13:07:42Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - Defending Against Stealthy Backdoor Attacks [1.6453255188693543]
Recent works have shown that it is not difficult to attack a natural language processing (NLP) model while defending against them is still a cat-mouse game.
In this work, we present a few defense strategies that can be useful to counter against such an attack.
arXiv Detail & Related papers (2022-05-27T21:38:42Z) - Rethink Stealthy Backdoor Attacks in Natural Language Processing [35.6803390044542]
The capacity of stealthy backdoor attacks is overestimated when categorized as backdoor attacks.
We propose a new metric called attack successful rate difference (ASRD), which measures the ASR difference between clean state and poison state models.
Our method achieves significantly better performance than state-of-the-art defense methods against stealthy backdoor attacks.
arXiv Detail & Related papers (2022-01-09T12:34:12Z) - ONION: A Simple and Effective Defense Against Textual Backdoor Attacks [91.83014758036575]
Backdoor attacks are a kind of emergent training-time threat to deep neural networks (DNNs)
In this paper, we propose a simple and effective textual backdoor defense named ONION.
Experiments demonstrate the effectiveness of our model in defending BiLSTM and BERT against five different backdoor attacks.
arXiv Detail & Related papers (2020-11-20T12:17:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.