Stateful Defenses for Machine Learning Models Are Not Yet Secure Against
Black-box Attacks
- URL: http://arxiv.org/abs/2303.06280v3
- Date: Tue, 26 Sep 2023 04:36:30 GMT
- Title: Stateful Defenses for Machine Learning Models Are Not Yet Secure Against
Black-box Attacks
- Authors: Ryan Feng, Ashish Hooda, Neal Mangaokar, Kassem Fawaz, Somesh Jha,
Atul Prakash
- Abstract summary: We show that stateful defense models (SDMs) are highly vulnerable to a new class of adaptive black-box attacks.
We propose a novel adaptive black-box attack strategy called Oracle-guided Adaptive Rejection Sampling (OARS)
We show how to apply the strategy to enhance six common black-box attacks to be more effective against current class of SDMs.
- Score: 28.93464970650329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has proposed stateful defense models (SDMs) as a compelling
strategy to defend against a black-box attacker who only has query access to
the model, as is common for online machine learning platforms. Such stateful
defenses aim to defend against black-box attacks by tracking the query history
and detecting and rejecting queries that are "similar" and thus preventing
black-box attacks from finding useful gradients and making progress towards
finding adversarial attacks within a reasonable query budget. Recent SDMs
(e.g., Blacklight and PIHA) have shown remarkable success in defending against
state-of-the-art black-box attacks. In this paper, we show that SDMs are highly
vulnerable to a new class of adaptive black-box attacks. We propose a novel
adaptive black-box attack strategy called Oracle-guided Adaptive Rejection
Sampling (OARS) that involves two stages: (1) use initial query patterns to
infer key properties about an SDM's defense; and, (2) leverage those extracted
properties to design subsequent query patterns to evade the SDM's defense while
making progress towards finding adversarial inputs. OARS is broadly applicable
as an enhancement to existing black-box attacks - we show how to apply the
strategy to enhance six common black-box attacks to be more effective against
current class of SDMs. For example, OARS-enhanced versions of black-box attacks
improved attack success rate against recent stateful defenses from almost 0% to
to almost 100% for multiple datasets within reasonable query budgets.
Related papers
- Counter-Samples: A Stateless Strategy to Neutralize Black Box Adversarial Attacks [2.9815109163161204]
Our paper presents a novel defence against black box attacks, where attackers use the victim model as an oracle to craft their adversarial examples.
Unlike traditional preprocessing defences that rely on sanitizing input samples, our strategy counters the attack process itself.
We demonstrate that our approach is remarkably effective against state-of-the-art black box attacks and outperforms existing defences for both the CIFAR-10 and ImageNet datasets.
arXiv Detail & Related papers (2024-03-14T10:59:54Z) - Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence [34.35162562625252]
Black-box adversarial attacks have demonstrated strong potential to compromise machine learning models.
We study a new paradigm of black-box attacks with provable guarantees.
This new black-box attack unveils significant vulnerabilities of machine learning models.
arXiv Detail & Related papers (2023-04-10T01:12:09Z) - Query Efficient Cross-Dataset Transferable Black-Box Attack on Action
Recognition [99.29804193431823]
Black-box adversarial attacks present a realistic threat to action recognition systems.
We propose a new attack on action recognition that addresses these shortcomings by generating perturbations.
Our method achieves 8% and higher 12% deception rates compared to state-of-the-art query-based and transfer-based attacks.
arXiv Detail & Related papers (2022-11-23T17:47:49Z) - Small Input Noise is Enough to Defend Against Query-based Black-box
Attacks [23.712389625037442]
In this paper, we show how Small Noise Defense can defend against query-based black-box attacks.
Even a small additive input noise can neutralize most query-based attacks.
Even with strong defense ability, SND almost maintains the original clean accuracy and computational speed.
arXiv Detail & Related papers (2021-01-13T01:45:59Z) - Improving Query Efficiency of Black-box Adversarial Attack [75.71530208862319]
We propose a Neural Process based black-box adversarial attack (NP-Attack)
NP-Attack could greatly decrease the query counts under the black-box setting.
arXiv Detail & Related papers (2020-09-24T06:22:56Z) - Simple and Efficient Hard Label Black-box Adversarial Attacks in Low
Query Budget Regimes [80.9350052404617]
We propose a simple and efficient Bayesian Optimization(BO) based approach for developing black-box adversarial attacks.
Issues with BO's performance in high dimensions are avoided by searching for adversarial examples in a structured low-dimensional subspace.
Our proposed approach consistently achieves 2x to 10x higher attack success rate while requiring 10x to 20x fewer queries.
arXiv Detail & Related papers (2020-07-13T04:34:57Z) - Blacklight: Scalable Defense for Neural Networks against Query-Based
Black-Box Attacks [34.04323550970413]
We propose Blacklight, a new defense against query-based black-box adversarial attacks.
Blacklight detects query-based black-box attacks by detecting highly similar queries.
We evaluate Blacklight against eight state-of-the-art attacks, across a variety of models and image classification tasks.
arXiv Detail & Related papers (2020-06-24T20:52:24Z) - RayS: A Ray Searching Method for Hard-label Adversarial Attack [99.72117609513589]
We present the Ray Searching attack (RayS), which greatly improves the hard-label attack effectiveness as well as efficiency.
RayS attack can also be used as a sanity check for possible "falsely robust" models.
arXiv Detail & Related papers (2020-06-23T07:01:50Z) - Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised
Learning [71.17774313301753]
We explore the robustness of self-supervised learned high-level representations by using them in the defense against adversarial attacks.
Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples.
arXiv Detail & Related papers (2020-06-05T03:03:06Z) - Spanning Attack: Reinforce Black-box Attacks with Unlabeled Data [96.92837098305898]
Black-box attacks aim to craft adversarial perturbations by querying input-output pairs of machine learning models.
Black-box attacks often suffer from the issue of query inefficiency due to the high dimensionality of the input space.
We propose a novel technique called the spanning attack, which constrains adversarial perturbations in a low-dimensional subspace via spanning an auxiliary unlabeled dataset.
arXiv Detail & Related papers (2020-05-11T05:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.