Evading Black-box Classifiers Without Breaking Eggs
- URL: http://arxiv.org/abs/2306.02895v2
- Date: Wed, 14 Feb 2024 13:46:01 GMT
- Title: Evading Black-box Classifiers Without Breaking Eggs
- Authors: Edoardo Debenedetti, Nicholas Carlini and Florian Tram\`er
- Abstract summary: Decision-based evasion attacks repeatedly query a black-box classifier to generate adversarial examples.
Prior work measures the cost of such attacks by the total number of queries made to the classifier.
We argue this metric is flawed and design new attacks that reduce the number of bad queries by $1.5$-$7.3times$.
- Score: 70.72391781899597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision-based evasion attacks repeatedly query a black-box classifier to
generate adversarial examples. Prior work measures the cost of such attacks by
the total number of queries made to the classifier. We argue this metric is
flawed. Most security-critical machine learning systems aim to weed out "bad"
data (e.g., malware, harmful content, etc). Queries to such systems carry a
fundamentally asymmetric cost: queries detected as "bad" come at a higher cost
because they trigger additional security filters, e.g., usage throttling or
account suspension. Yet, we find that existing decision-based attacks issue a
large number of "bad" queries, which likely renders them ineffective against
security-critical systems. We then design new attacks that reduce the number of
bad queries by $1.5$-$7.3\times$, but often at a significant increase in total
(non-bad) queries. We thus pose it as an open problem to build black-box
attacks that are more effective under realistic cost metrics.
Related papers
- Rewriting the Budget: A General Framework for Black-Box Attacks Under Cost Asymmetry [11.292557925135283]
We propose a general framework for decision-based attacks under asymmetric query costs.<n>We design efficient algorithms that minimize total attack cost by balancing different query types.<n>Our method achieves consistently lower total query cost and smaller perturbations than existing approaches.
arXiv Detail & Related papers (2025-06-07T22:02:27Z) - Benchmarking Misuse Mitigation Against Covert Adversaries [80.74502950627736]
Existing language model safety evaluations focus on overt attacks and low-stakes tasks.<n>We develop Benchmarks for Stateful Defenses (BSD), a data generation pipeline that automates evaluations of covert attacks and corresponding defenses.<n>Our evaluations indicate that decomposition attacks are effective misuse enablers, and highlight stateful defenses as a countermeasure.
arXiv Detail & Related papers (2025-06-06T17:33:33Z) - Query Provenance Analysis: Efficient and Robust Defense against Query-based Black-box Attacks [11.32992178606254]
We propose a novel approach, Query Provenance Analysis (QPA), for more robust and efficient Stateful Defense Models (SDMs)
QPA encapsulates the historical relationships among queries as the sequence feature to capture the fundamental difference between benign and adversarial query sequences.
We evaluate QPA compared with two baselines, BlackLight and PIHA, on four widely used datasets with six query-based black-box attack algorithms.
arXiv Detail & Related papers (2024-05-31T06:56:54Z) - BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack [22.408968332454062]
We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries.
We develop the BruSLeAttack-a new, faster (more query-efficient) algorithm for the problem.
Our work facilitates faster evaluation of model vulnerabilities and raises our vigilance on the safety, security and reliability of deployed systems.
arXiv Detail & Related papers (2024-04-08T08:59:26Z) - Preprocessors Matter! Realistic Decision-Based Attacks on Machine
Learning Systems [56.64374584117259]
Decision-based attacks construct adversarial examples against a machine learning (ML) model by making only hard-label queries.
We develop techniques to (i) reverse-engineer the preprocessor and then (ii) use this extracted information to attack the end-to-end system.
Our preprocessors extraction method requires only a few hundred queries, and our preprocessor-aware attacks recover the same efficacy as when attacking the model alone.
arXiv Detail & Related papers (2022-10-07T03:10:34Z) - Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results.
A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check.
We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z) - Small Input Noise is Enough to Defend Against Query-based Black-box
Attacks [23.712389625037442]
In this paper, we show how Small Noise Defense can defend against query-based black-box attacks.
Even a small additive input noise can neutralize most query-based attacks.
Even with strong defense ability, SND almost maintains the original clean accuracy and computational speed.
arXiv Detail & Related papers (2021-01-13T01:45:59Z) - SurFree: a fast surrogate-free black-box attack [17.323638042215013]
Adversarial examples are slightly modified inputs that are then misclassified, while remaining perceptively close to their originals.
Last couple of years have witnessed a striking decrease in the amount of queries a black box attack submits to the target.
This paper presents SurFree, a geometrical approach that achieves a similar drastic reduction in the amount of queries in the hardest setup: black box decision-based attacks.
arXiv Detail & Related papers (2020-11-25T15:08:19Z) - Simple and Efficient Hard Label Black-box Adversarial Attacks in Low
Query Budget Regimes [80.9350052404617]
We propose a simple and efficient Bayesian Optimization(BO) based approach for developing black-box adversarial attacks.
Issues with BO's performance in high dimensions are avoided by searching for adversarial examples in a structured low-dimensional subspace.
Our proposed approach consistently achieves 2x to 10x higher attack success rate while requiring 10x to 20x fewer queries.
arXiv Detail & Related papers (2020-07-13T04:34:57Z) - Blacklight: Scalable Defense for Neural Networks against Query-Based
Black-Box Attacks [34.04323550970413]
We propose Blacklight, a new defense against query-based black-box adversarial attacks.
Blacklight detects query-based black-box attacks by detecting highly similar queries.
We evaluate Blacklight against eight state-of-the-art attacks, across a variety of models and image classification tasks.
arXiv Detail & Related papers (2020-06-24T20:52:24Z) - RayS: A Ray Searching Method for Hard-label Adversarial Attack [99.72117609513589]
We present the Ray Searching attack (RayS), which greatly improves the hard-label attack effectiveness as well as efficiency.
RayS attack can also be used as a sanity check for possible "falsely robust" models.
arXiv Detail & Related papers (2020-06-23T07:01:50Z) - Spanning Attack: Reinforce Black-box Attacks with Unlabeled Data [96.92837098305898]
Black-box attacks aim to craft adversarial perturbations by querying input-output pairs of machine learning models.
Black-box attacks often suffer from the issue of query inefficiency due to the high dimensionality of the input space.
We propose a novel technique called the spanning attack, which constrains adversarial perturbations in a low-dimensional subspace via spanning an auxiliary unlabeled dataset.
arXiv Detail & Related papers (2020-05-11T05:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.