Theoretically Principled Trade-off for Stateful Defenses against
Query-Based Black-Box Attacks
- URL: http://arxiv.org/abs/2307.16331v1
- Date: Sun, 30 Jul 2023 22:31:01 GMT
- Title: Theoretically Principled Trade-off for Stateful Defenses against
Query-Based Black-Box Attacks
- Authors: Ashish Hooda, Neal Mangaokar, Ryan Feng, Kassem Fawaz, Somesh Jha,
Atul Prakash
- Abstract summary: We offer a theoretical characterization of the trade-off between detection and false positive rates for stateful defenses.
We analyze the impact of this trade-off on the convergence of black-box attacks.
- Score: 26.905553663353825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial examples threaten the integrity of machine learning systems with
alarming success rates even under constrained black-box conditions. Stateful
defenses have emerged as an effective countermeasure, detecting potential
attacks by maintaining a buffer of recent queries and detecting new queries
that are too similar. However, these defenses fundamentally pose a trade-off
between attack detection and false positive rates, and this trade-off is
typically optimized by hand-picking feature extractors and similarity
thresholds that empirically work well. There is little current understanding as
to the formal limits of this trade-off and the exact properties of the feature
extractors/underlying problem domain that influence it. This work aims to
address this gap by offering a theoretical characterization of the trade-off
between detection and false positive rates for stateful defenses. We provide
upper bounds for detection rates of a general class of feature extractors and
analyze the impact of this trade-off on the convergence of black-box attacks.
We then support our theoretical findings with empirical evaluations across
multiple datasets and stateful defenses.
Related papers
- The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense [56.32083100401117]
We investigate why Vision Large Language Models (VLLMs) are prone to jailbreak attacks.
We then make a key observation: existing defense mechanisms suffer from an textbfover-prudence problem.
We find that the two representative evaluation methods for jailbreak often exhibit chance agreement.
arXiv Detail & Related papers (2024-11-13T07:57:19Z) - Certified Causal Defense with Generalizable Robustness [14.238441767523602]
We propose a novel certified defense framework GLEAN, which incorporates a causal perspective into the generalization problem in certified defense.
Our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label.
On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors.
arXiv Detail & Related papers (2024-08-28T00:14:09Z) - PuriDefense: Randomized Local Implicit Adversarial Purification for
Defending Black-box Query-based Attacks [15.842917276255141]
Black-box query-based attacks threaten Machine Learning as a Service (ML) systems.
We propose an efficient defense mechanism, PuriDefense, that employs random patch-wise purifications with an ensemble of lightweight purification models at a low level of inference cost.
Our theoretical analysis suggests that this approach slows down the convergence of query-based attacks by incorporating randomness into purifications.
arXiv Detail & Related papers (2024-01-19T09:54:23Z) - AdvFAS: A robust face anti-spoofing framework against adversarial
examples [24.07755324680827]
We propose a robust face anti-spoofing framework, namely AdvFAS, that leverages two coupled scores to accurately distinguish between correctly detected and wrongly detected face images.
Experiments demonstrate the effectiveness of our framework in a variety of settings, including different attacks, datasets, and backbones.
arXiv Detail & Related papers (2023-08-04T02:47:19Z) - Towards Fair Classification against Poisoning Attacks [52.57443558122475]
We study the poisoning scenario where the attacker can insert a small fraction of samples into training data.
We propose a general and theoretically guaranteed framework which accommodates traditional defense methods to fair classification against poisoning attacks.
arXiv Detail & Related papers (2022-10-18T00:49:58Z) - Attack-Agnostic Adversarial Detection [13.268960384729088]
We quantify the statistical deviation caused by adversarial agnostics in two aspects.
We show that our method can achieve an overall ROC AUC of 94.9%, 89.7%, and 94.6% on CIFAR10, CIFAR100, and SVHN, respectively, and has comparable performance to adversarial detectors trained with adversarial examples on most of the attacks.
arXiv Detail & Related papers (2022-06-01T13:41:40Z) - ADC: Adversarial attacks against object Detection that evade Context
consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples.
We propose an adaptive framework to generate examples that subvert such defenses.
Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - Advocating for Multiple Defense Strategies against Adversarial Examples [66.90877224665168]
It has been empirically observed that defense mechanisms designed to protect neural networks against $ell_infty$ adversarial examples offer poor performance.
In this paper we conduct a geometrical analysis that validates this observation.
Then, we provide a number of empirical insights to illustrate the effect of this phenomenon in practice.
arXiv Detail & Related papers (2020-12-04T14:42:46Z) - Adversarial Example Games [51.92698856933169]
Adrial Example Games (AEG) is a framework that models the crafting of adversarial examples.
AEG provides a new way to design adversarial examples by adversarially training a generator and aversa from a given hypothesis class.
We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets.
arXiv Detail & Related papers (2020-07-01T19:47:23Z) - Luring of transferable adversarial perturbations in the black-box
paradigm [0.0]
We present a new approach to improve the robustness of a model against black-box transfer attacks.
A removable additional neural network is included in the target model, and is designed to induce the textitluring effect.
Our deception-based method only needs to have access to the predictions of the target model and does not require a labeled data set.
arXiv Detail & Related papers (2020-04-10T06:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.