Increasing Confidence in Adversarial Robustness Evaluations
- URL: http://arxiv.org/abs/2206.13991v1
- Date: Tue, 28 Jun 2022 13:28:13 GMT
- Title: Increasing Confidence in Adversarial Robustness Evaluations
- Authors: Roland S. Zimmermann, Wieland Brendel, Florian Tramer, Nicholas
Carlini
- Abstract summary: We propose a test to identify weak attacks and thus weak defense evaluations.
Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample.
For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it.
- Score: 53.2174171468716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hundreds of defenses have been proposed to make deep neural networks robust
against minimal (adversarial) input perturbations. However, only a handful of
these defenses held up their claims because correctly evaluating robustness is
extremely challenging: Weak attacks often fail to find adversarial examples
even if they unknowingly exist, thereby making a vulnerable network look
robust. In this paper, we propose a test to identify weak attacks, and thus
weak defense evaluations. Our test slightly modifies a neural network to
guarantee the existence of an adversarial example for every sample.
Consequentially, any correct attack must succeed in breaking this modified
network. For eleven out of thirteen previously-published defenses, the original
evaluation of the defense fails our test, while stronger attacks that break
these defenses pass it. We hope that attack unit tests - such as ours - will be
a major component in future robustness evaluations and increase confidence in
an empirical field that is currently riddled with skepticism.
Related papers
- Closing the Gap: Achieving Better Accuracy-Robustness Tradeoffs against Query-Based Attacks [1.54994260281059]
We show how to efficiently establish, at test-time, a solid tradeoff between robustness and accuracy when mitigating query-based attacks.
Our approach is independent of training and supported by theory.
arXiv Detail & Related papers (2023-12-15T17:02:19Z) - Confidence-driven Sampling for Backdoor Attacks [49.72680157684523]
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios.
Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples.
We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks.
arXiv Detail & Related papers (2023-10-08T18:57:36Z) - Randomness in ML Defenses Helps Persistent Attackers and Hinders
Evaluators [49.52538232104449]
It is becoming increasingly imperative to design robust ML defenses.
Recent work has found that many defenses that initially resist state-of-the-art attacks can be broken by an adaptive adversary.
We take steps to simplify the design of defenses and argue that white-box defenses should eschew randomness when possible.
arXiv Detail & Related papers (2023-02-27T01:33:31Z) - Can Adversarial Training Be Manipulated By Non-Robust Features? [64.73107315313251]
Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks.
We identify a novel threat model named stability attacks, which aims to hinder robust availability by slightly perturbing the training data.
Under this threat, we find that adversarial training using a conventional defense budget $epsilon$ provably fails to provide test robustness in a simple statistical setting.
arXiv Detail & Related papers (2022-01-31T16:25:25Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z) - Reliable evaluation of adversarial robustness with an ensemble of
diverse parameter-free attacks [65.20660287833537]
In this paper we propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function.
We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.
arXiv Detail & Related papers (2020-03-03T18:15:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.