Related papers: Randomness in ML Defenses Helps Persistent Attackers and Hinders Evaluators

Randomness in ML Defenses Helps Persistent Attackers and Hinders Evaluators

URL: http://arxiv.org/abs/2302.13464v1
Date: Mon, 27 Feb 2023 01:33:31 GMT
Title: Randomness in ML Defenses Helps Persistent Attackers and Hinders Evaluators
Authors: Keane Lucas, Matthew Jagielski, Florian Tram\`er, Lujo Bauer, Nicholas Carlini
Abstract summary: It is becoming increasingly imperative to design robust ML defenses. Recent work has found that many defenses that initially resist state-of-the-art attacks can be broken by an adaptive adversary. We take steps to simplify the design of defenses and argue that white-box defenses should eschew randomness when possible.
Score: 49.52538232104449
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is becoming increasingly imperative to design robust ML defenses. However, recent work has found that many defenses that initially resist state-of-the-art attacks can be broken by an adaptive adversary. In this work we take steps to simplify the design of defenses and argue that white-box defenses should eschew randomness when possible. We begin by illustrating a new issue with the deployment of randomized defenses that reduces their security compared to their deterministic counterparts. We then provide evidence that making defenses deterministic simplifies robustness evaluation, without reducing the effectiveness of a truly robust defense. Finally, we introduce a new defense evaluation framework that leverages a defense's deterministic nature to better evaluate its adversarial robustness.

Related papers

Benchmarking Misuse Mitigation Against Covert Adversaries [80.74502950627736]
Existing language model safety evaluations focus on overt attacks and low-stakes tasks.<n>We develop Benchmarks for Stateful Defenses (BSD), a data generation pipeline that automates evaluations of covert attacks and corresponding defenses.<n>Our evaluations indicate that decomposition attacks are effective misuse enablers, and highlight stateful defenses as a countermeasure.
arXiv Detail & Related papers (2025-06-06T17:33:33Z)
A Critical Evaluation of Defenses against Prompt Injection Attacks [95.81023801370073]
Large Language Models (LLMs) are vulnerable to prompt injection attacks.<n>Several defenses have recently been proposed, often claiming to mitigate these attacks successfully.<n>We argue that existing studies lack a principled approach to evaluating these defenses.
arXiv Detail & Related papers (2025-05-23T19:39:56Z)
Position: Towards Resilience Against Adversarial Examples [42.09231029292568]
We provide a definition of adversarial resilience and outline considerations of designing an adversarially resilient defense. We then introduce a subproblem of adversarial resilience which we call continual adaptive robustness. We demonstrate the connection between continual adaptive robustness and previously studied problems of multiattack robustness and unforeseen attack robustness.
arXiv Detail & Related papers (2024-05-02T14:58:44Z)
Hindering Adversarial Attacks with Multiple Encrypted Patch Embeddings [13.604830818397629]
We propose a new key-based defense focusing on both efficiency and robustness. We build upon the previous defense with two major improvements: (1) efficient training and (2) optional randomization. Experiments were carried out on the ImageNet dataset, and the proposed defense was evaluated against an arsenal of state-of-the-art attacks.
arXiv Detail & Related papers (2023-09-04T14:08:34Z)
Increasing Confidence in Adversarial Robustness Evaluations [53.2174171468716]
We propose a test to identify weak attacks and thus weak defense evaluations. Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample. For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it.
arXiv Detail & Related papers (2022-06-28T13:28:13Z)
Evaluating the Adversarial Robustness of Adaptive Test-time Defenses [60.55448652445904]
We categorize such adaptive testtime defenses and explain their potential benefits and drawbacks. Unfortunately, none significantly improve upon static models when evaluated appropriately. Some even weaken the underlying static model while simultaneously increasing inference cost.
arXiv Detail & Related papers (2022-02-28T12:11:40Z)
Can Adversarial Training Be Manipulated By Non-Robust Features? [64.73107315313251]
Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks. We identify a novel threat model named stability attacks, which aims to hinder robust availability by slightly perturbing the training data. Under this threat, we find that adversarial training using a conventional defense budget $epsilon$ provably fails to provide test robustness in a simple statistical setting.
arXiv Detail & Related papers (2022-01-31T16:25:25Z)
MAD-VAE: Manifold Awareness Defense Variational Autoencoder [0.0]
We introduce several methods to improve the robustness of defense models. With extensive experiments on MNIST data set, we have demonstrated the effectiveness of our algorithms. We also discuss the applicability of existing adversarial latent space attacks as they may have a significant flaw.
arXiv Detail & Related papers (2020-10-31T09:04:25Z)
A Game Theoretic Analysis of Additive Adversarial Attacks and Defenses [4.94950858749529]
We propose a game-theoretic framework for studying attacks and defenses which exist in equilibrium. We show how this equilibrium defense can be approximated given finitely many samples from a data-generating distribution.
arXiv Detail & Related papers (2020-09-14T15:51:15Z)
Harnessing adversarial examples with a surprisingly simple defense [47.64219291655723]
I introduce a very simple method to defend against adversarial examples. The basic idea is to raise the slope of the ReLU function at the test time. Experiments over MNIST and CIFAR-10 datasets demonstrate the effectiveness of the proposed defense.
arXiv Detail & Related papers (2020-04-26T03:09:42Z)
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks [65.20660287833537]
In this paper we propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.
arXiv Detail & Related papers (2020-03-03T18:15:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.