Related papers: Theoretically Principled Trade-off for Stateful Defenses against Query-Based Black-Box Attacks

Theoretically Principled Trade-off for Stateful Defenses against Query-Based Black-Box Attacks

URL: http://arxiv.org/abs/2307.16331v1
Date: Sun, 30 Jul 2023 22:31:01 GMT
Title: Theoretically Principled Trade-off for Stateful Defenses against Query-Based Black-Box Attacks
Authors: Ashish Hooda, Neal Mangaokar, Ryan Feng, Kassem Fawaz, Somesh Jha, Atul Prakash
Abstract summary: We offer a theoretical characterization of the trade-off between detection and false positive rates for stateful defenses. We analyze the impact of this trade-off on the convergence of black-box attacks.
Score: 26.905553663353825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adversarial examples threaten the integrity of machine learning systems with alarming success rates even under constrained black-box conditions. Stateful defenses have emerged as an effective countermeasure, detecting potential attacks by maintaining a buffer of recent queries and detecting new queries that are too similar. However, these defenses fundamentally pose a trade-off between attack detection and false positive rates, and this trade-off is typically optimized by hand-picking feature extractors and similarity thresholds that empirically work well. There is little current understanding as to the formal limits of this trade-off and the exact properties of the feature extractors/underlying problem domain that influence it. This work aims to address this gap by offering a theoretical characterization of the trade-off between detection and false positive rates for stateful defenses. We provide upper bounds for detection rates of a general class of feature extractors and analyze the impact of this trade-off on the convergence of black-box attacks. We then support our theoretical findings with empirical evaluations across multiple datasets and stateful defenses.

Related papers

Adversary-Augmented Simulation for Fairness Evaluation and Defense in Hyperledger Fabric [0.0]
This paper presents an adversary model and a simulation framework specifically tailored for analyzing attacks on distributed systems composed of multiple protocols. Our model classifies and constrains adversarial actions based on the assumptions of the target protocols. We apply this framework to analyze fairness properties in a Hyperledger Fabric (HF) blockchain network.
arXiv Detail & Related papers (2025-04-17T08:17:27Z)
The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense [56.32083100401117]
We investigate why Vision Large Language Models (VLLMs) are prone to jailbreak attacks. We then make a key observation: existing defense mechanisms suffer from an textbfover-prudence problem. We find that the two representative evaluation methods for jailbreak often exhibit chance agreement.
arXiv Detail & Related papers (2024-11-13T07:57:19Z)
Certified Causal Defense with Generalizable Robustness [14.238441767523602]
We propose a novel certified defense framework GLEAN, which incorporates a causal perspective into the generalization problem in certified defense. Our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label. On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors.
arXiv Detail & Related papers (2024-08-28T00:14:09Z)
PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks [15.842917276255141]
Black-box query-based attacks threaten Machine Learning as a Service (ML) systems. We propose an efficient defense mechanism, PuriDefense, that employs random patch-wise purifications with an ensemble of lightweight purification models at a low level of inference cost. Our theoretical analysis suggests that this approach slows down the convergence of query-based attacks by incorporating randomness into purifications.
arXiv Detail & Related papers (2024-01-19T09:54:23Z)
AdvFAS: A robust face anti-spoofing framework against adversarial examples [24.07755324680827]
We propose a robust face anti-spoofing framework, namely AdvFAS, that leverages two coupled scores to accurately distinguish between correctly detected and wrongly detected face images. Experiments demonstrate the effectiveness of our framework in a variety of settings, including different attacks, datasets, and backbones.
arXiv Detail & Related papers (2023-08-04T02:47:19Z)
Towards Fair Classification against Poisoning Attacks [52.57443558122475]
We study the poisoning scenario where the attacker can insert a small fraction of samples into training data. We propose a general and theoretically guaranteed framework which accommodates traditional defense methods to fair classification against poisoning attacks.
arXiv Detail & Related papers (2022-10-18T00:49:58Z)
Attack-Agnostic Adversarial Detection [13.268960384729088]
We quantify the statistical deviation caused by adversarial agnostics in two aspects. We show that our method can achieve an overall ROC AUC of 94.9%, 89.7%, and 94.6% on CIFAR10, CIFAR100, and SVHN, respectively, and has comparable performance to adversarial detectors trained with adversarial examples on most of the attacks.
arXiv Detail & Related papers (2022-06-01T13:41:40Z)
ADC: Adversarial attacks against object Detection that evade Context consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples. We propose an adaptive framework to generate examples that subvert such defenses. Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z)
Balancing detectability and performance of attacks on the control channel of Markov Decision Processes [77.66954176188426]
We investigate the problem of designing optimal stealthy poisoning attacks on the control channel of Markov decision processes (MDPs) This research is motivated by the recent interest of the research community for adversarial and poisoning attacks applied to MDPs, and reinforcement learning (RL) methods.
arXiv Detail & Related papers (2021-09-15T09:13:10Z)
Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature. In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property. This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z)
Advocating for Multiple Defense Strategies against Adversarial Examples [66.90877224665168]
It has been empirically observed that defense mechanisms designed to protect neural networks against $ell_infty$ adversarial examples offer poor performance. In this paper we conduct a geometrical analysis that validates this observation. Then, we provide a number of empirical insights to illustrate the effect of this phenomenon in practice.
arXiv Detail & Related papers (2020-12-04T14:42:46Z)
Adversarial Example Games [51.92698856933169]
Adrial Example Games (AEG) is a framework that models the crafting of adversarial examples. AEG provides a new way to design adversarial examples by adversarially training a generator and aversa from a given hypothesis class. We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets.
arXiv Detail & Related papers (2020-07-01T19:47:23Z)
Luring of transferable adversarial perturbations in the black-box paradigm [0.0]
We present a new approach to improve the robustness of a model against black-box transfer attacks. A removable additional neural network is included in the target model, and is designed to induce the textitluring effect. Our deception-based method only needs to have access to the predictions of the target model and does not require a labeled data set.
arXiv Detail & Related papers (2020-04-10T06:48:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.