Related papers: Defending Against Stealthy Backdoor Attacks

Defending Against Stealthy Backdoor Attacks

URL: http://arxiv.org/abs/2205.14246v1
Date: Fri, 27 May 2022 21:38:42 GMT
Title: Defending Against Stealthy Backdoor Attacks
Authors: Sangeet Sagar, Abhinav Bhatt, Abhijith Srinivas Bidaralli
Abstract summary: Recent works have shown that it is not difficult to attack a natural language processing (NLP) model while defending against them is still a cat-mouse game. In this work, we present a few defense strategies that can be useful to counter against such an attack.
Score: 1.6453255188693543
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Defenses against security threats have been an interest of recent studies. Recent works have shown that it is not difficult to attack a natural language processing (NLP) model while defending against them is still a cat-mouse game. Backdoor attacks are one such attack where a neural network is made to perform in a certain way on specific samples containing some triggers while achieving normal results on other samples. In this work, we present a few defense strategies that can be useful to counter against such an attack. We show that our defense methodologies significantly decrease the performance on the attacked inputs while maintaining similar performance on benign inputs. We also show that some of our defenses have very less runtime and also maintain similarity with the original inputs.

Related papers

Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them. We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z)
The Best Defense is a Good Offense: Adversarial Augmentation against Adversarial Attacks [91.56314751983133]
$A5$ is a framework to craft a defensive perturbation to guarantee that any attack towards the input in hand will fail. We show effective on-the-fly defensive augmentation with a robustifier network that ignores the ground truth label. We also show how to apply $A5$ to create certifiably robust physical objects.
arXiv Detail & Related papers (2023-05-23T16:07:58Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Defending against Insertion-based Textual Backdoor Attacks via Attribution [18.935041122443675]
We propose AttDef, an efficient attribution-based pipeline to defend against two insertion-based poisoning attacks. Specifically, we regard the tokens with larger attribution scores as potential triggers since larger attribution words contribute more to the false prediction results. We show that our proposed method can generalize sufficiently well in two common attack scenarios.
arXiv Detail & Related papers (2023-05-03T19:29:26Z)
Backdoor Attack against NLP models with Robustness-Aware Perturbation defense [0.0]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs) In our work, we break this defense by controlling the robustness gap between poisoned and clean samples using adversarial training step.
arXiv Detail & Related papers (2022-04-08T10:08:07Z)
Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks. We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance. These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z)
Rethinking the Trigger of Backdoor Attack [83.98031510668619]
Currently, most of existing backdoor attacks adopted the setting of emphstatic trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area. We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2020-04-09T17:19:37Z)
On Certifying Robustness against Backdoor Attacks via Randomized Smoothing [74.79764677396773]
We study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing. Our results show the theoretical feasibility of using randomized smoothing to certify robustness against backdoor attacks. Existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks.
arXiv Detail & Related papers (2020-02-26T19:15:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.