A Partial Break of the Honeypots Defense to Catch Adversarial Attacks
- URL: http://arxiv.org/abs/2009.10975v1
- Date: Wed, 23 Sep 2020 07:36:37 GMT
- Title: A Partial Break of the Honeypots Defense to Catch Adversarial Attacks
- Authors: Nicholas Carlini
- Abstract summary: We break the baseline version of this defense by reducing the detection true positive rate to 0% and the detection AUC to 0.02.
To aid further research, we release the complete 2.5 hour keystroke-by-keystroke screen recording of our attack process at https://nicholas.carlini.com/code/ccs_honeypot_break.
- Score: 57.572998144258705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent defense proposes to inject "honeypots" into neural networks in order
to detect adversarial attacks. We break the baseline version of this defense by
reducing the detection true positive rate to 0\% and the detection AUC to 0.02,
maintaining the original distortion bounds. The authors of the original paper
have amended the defense in their CCS'20 paper to mitigate this attacks. To aid
further research, we release the complete 2.5 hour keystroke-by-keystroke
screen recording of our attack process at
https://nicholas.carlini.com/code/ccs_honeypot_break.
Related papers
- Diffusion Denoising as a Certified Defense against Clean-label Poisoning [56.04951180983087]
We show how an off-the-shelf diffusion model can sanitize the tampered training data.
We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy.
arXiv Detail & Related papers (2024-03-18T17:17:07Z) - Beating Backdoor Attack at Its Own Game [10.131734154410763]
Deep neural networks (DNNs) are vulnerable to backdoor attack.
Existing defense methods have greatly reduced attack success rate.
We propose a highly effective framework which injects non-adversarial backdoors targeting poisoned samples.
arXiv Detail & Related papers (2023-07-28T13:07:42Z) - The Best Defense is a Good Offense: Adversarial Augmentation against
Adversarial Attacks [91.56314751983133]
$A5$ is a framework to craft a defensive perturbation to guarantee that any attack towards the input in hand will fail.
We show effective on-the-fly defensive augmentation with a robustifier network that ignores the ground truth label.
We also show how to apply $A5$ to create certifiably robust physical objects.
arXiv Detail & Related papers (2023-05-23T16:07:58Z) - PECAN: A Deterministic Certified Defense Against Backdoor Attacks [17.0639534812572]
We present PECAN, an efficient and certified approach for defending against backdoor attacks.
We evaluate PECAN on image classification and malware detection datasets.
arXiv Detail & Related papers (2023-01-27T16:25:43Z) - Defending Against Stealthy Backdoor Attacks [1.6453255188693543]
Recent works have shown that it is not difficult to attack a natural language processing (NLP) model while defending against them is still a cat-mouse game.
In this work, we present a few defense strategies that can be useful to counter against such an attack.
arXiv Detail & Related papers (2022-05-27T21:38:42Z) - HaS-Nets: A Heal and Select Mechanism to Defend DNNs Against Backdoor
Attacks for Data Collection Scenarios [23.898803100714957]
"Low-confidence backdoor attack" exploits confidence labels assigned to poisoned training samples.
"HaS-Nets" can decrease ASRs from over 90% to less than 15%, independent of the dataset.
arXiv Detail & Related papers (2020-12-14T12:47:41Z) - Backdoor Attacks to Graph Neural Networks [73.56867080030091]
We propose the first backdoor attack to graph neural networks (GNN)
In our backdoor attack, a GNN predicts an attacker-chosen target label for a testing graph once a predefined subgraph is injected to the testing graph.
Our empirical results show that our backdoor attacks are effective with a small impact on a GNN's prediction accuracy for clean testing graphs.
arXiv Detail & Related papers (2020-06-19T14:51:01Z) - Certified Defenses for Adversarial Patches [72.65524549598126]
Adversarial patch attacks are among the most practical threat models against real-world computer vision systems.
This paper studies certified and empirical defenses against patch attacks.
arXiv Detail & Related papers (2020-03-14T19:57:31Z) - On Certifying Robustness against Backdoor Attacks via Randomized
Smoothing [74.79764677396773]
We study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing.
Our results show the theoretical feasibility of using randomized smoothing to certify robustness against backdoor attacks.
Existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks.
arXiv Detail & Related papers (2020-02-26T19:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.