Cutting through buggy adversarial example defenses: fixing 1 line of code breaks Sabre
- URL: http://arxiv.org/abs/2405.03672v3
- Date: Mon, 1 Jul 2024 15:57:59 GMT
- Title: Cutting through buggy adversarial example defenses: fixing 1 line of code breaks Sabre
- Authors: Nicholas Carlini,
- Abstract summary: Sabre is a defense to adversarial examples that was accepted at IEEE S&P 2024.
We first reveal significant flaws in the evaluation that point to clear signs of gradient masking.
We then show the cause of this masking gradient: a bug in the original evaluation code.
- Score: 64.55144029671106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sabre is a defense to adversarial examples that was accepted at IEEE S&P 2024. We first reveal significant flaws in the evaluation that point to clear signs of gradient masking. We then show the cause of this gradient masking: a bug in the original evaluation code. By fixing a single line of code in the original repository, we reduce Sabre's robust accuracy to 0%. In response to this, the authors modify the defense and introduce a new defense component not described in the original paper. But this fix contains a second bug; modifying one more line of code reduces robust accuracy to below baseline levels. After we released the first version of our paper online, the authors introduced another change to the defense; by commenting out one line of code during attack we reduce the robust accuracy to 0% again.
Related papers
- Predicting Likely-Vulnerable Code Changes: Machine Learning-based Vulnerability Protections for Android Open Source Project [0.0]
This paper presents a framework that selectively triggers security reviews for incoming source code changes.
The framework can automatically request additional security reviews at pre-submit time before the code changes are submitted to a source code repository.
arXiv Detail & Related papers (2024-05-26T18:17:46Z) - Increasing Confidence in Adversarial Robustness Evaluations [53.2174171468716]
We propose a test to identify weak attacks and thus weak defense evaluations.
Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample.
For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it.
arXiv Detail & Related papers (2022-06-28T13:28:13Z) - Don't sweat the small stuff, classify the rest: Sample Shielding to
protect text classifiers against adversarial attacks [2.512827436728378]
Deep learning (DL) is being used extensively for text classification.
Attackers modify the text in a way which misleads the classifier while keeping the original meaning close to intact.
We propose a novel and intuitive defense strategy called Sample Shielding.
arXiv Detail & Related papers (2022-05-03T18:24:20Z) - Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas.
We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data.
Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data.
BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z) - A Partial Break of the Honeypots Defense to Catch Adversarial Attacks [57.572998144258705]
We break the baseline version of this defense by reducing the detection true positive rate to 0% and the detection AUC to 0.02.
To aid further research, we release the complete 2.5 hour keystroke-by-keystroke screen recording of our attack process at https://nicholas.carlini.com/code/ccs_honeypot_break.
arXiv Detail & Related papers (2020-09-23T07:36:37Z) - Contrastive Code Representation Learning [95.86686147053958]
We show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics.
We propose ContraCode: a contrastive pre-training task that learns code functionality, not form.
arXiv Detail & Related papers (2020-07-09T17:59:06Z) - Detection as Regression: Certified Object Detection by Median Smoothing [50.89591634725045]
This work is motivated by recent progress on certified classification by randomized smoothing.
We obtain the first model-agnostic, training-free, and certified defense for object detection against $ell$-bounded attacks.
arXiv Detail & Related papers (2020-07-07T18:40:19Z) - Robust and Accurate Authorship Attribution via Program Normalization [24.381734600088453]
Source code attribution approaches have achieved remarkable accuracy thanks to the rapid advances in deep learning.
In particular, they can be easily deceived by adversaries who attempt to either create a forgery of another author or to mask the original author.
We present a novel learning framework, $textitnormalize-and-predict$ ($textitN&P$), that in theory guarantees the robustness of any authorship-attribution approach.
arXiv Detail & Related papers (2020-07-01T21:27:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.