Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box
Score-Based Query Attacks
- URL: http://arxiv.org/abs/2205.12134v1
- Date: Tue, 24 May 2022 15:10:50 GMT
- Title: Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box
Score-Based Query Attacks
- Authors: Sizhe Chen, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin
Huang
- Abstract summary: We propose a novel defense, namely Adversarial Attack on Attackers (AAA), to confound SQAs towards incorrect attack directions.
In this way, SQAs are prevented regardless of the model's worst-case robustness.
- Score: 25.053383672515697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The score-based query attacks (SQAs) pose practical threats to deep neural
networks by crafting adversarial perturbations within dozens of queries, only
using the model's output scores. Nonetheless, we note that if the loss trend of
the outputs is slightly perturbed, SQAs could be easily misled and thereby
become much less effective. Following this idea, we propose a novel defense,
namely Adversarial Attack on Attackers (AAA), to confound SQAs towards
incorrect attack directions by slightly modifying the output logits. In this
way, (1) SQAs are prevented regardless of the model's worst-case robustness;
(2) the original model predictions are hardly changed, i.e., no degradation on
clean accuracy; (3) the calibration of confidence scores can be improved
simultaneously. Extensive experiments are provided to verify the above
advantages. For example, by setting $\ell_\infty=8/255$ on CIFAR-10, our
proposed AAA helps WideResNet-28 secure $80.59\%$ accuracy under Square attack
($2500$ queries), while the best prior defense (i.e., adversarial training)
only attains $67.44\%$. Since AAA attacks SQA's general greedy strategy, such
advantages of AAA over 8 defenses can be consistently observed on 8
CIFAR-10/ImageNet models under 6 SQAs, using different attack targets and
bounds. Moreover, AAA calibrates better without hurting the accuracy. Our code
would be released.
Related papers
- Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust [65.95797963483729]
Ensemble everything everywhere is a defense to adversarial examples.
We show that this defense is not robust to adversarial attack.
We then use standard adaptive attack techniques to reduce the defense's robust accuracy.
arXiv Detail & Related papers (2024-11-22T10:17:32Z) - BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack [22.408968332454062]
We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries.
We develop the BruSLeAttack-a new, faster (more query-efficient) algorithm for the problem.
Our work facilitates faster evaluation of model vulnerabilities and raises our vigilance on the safety, security and reliability of deployed systems.
arXiv Detail & Related papers (2024-04-08T08:59:26Z) - PubDef: Defending Against Transfer Attacks From Public Models [6.0012551318569285]
We propose a new practical threat model where the adversary relies on transfer attacks through publicly available surrogate models.
We evaluate the transfer attacks in this setting and propose a specialized defense method based on a game-theoretic perspective.
Under this threat model, our defense, PubDef, outperforms the state-of-the-art white-box adversarial training by a large margin with almost no loss in the normal accuracy.
arXiv Detail & Related papers (2023-10-26T17:58:08Z) - The Best Defense is a Good Offense: Adversarial Augmentation against
Adversarial Attacks [91.56314751983133]
$A5$ is a framework to craft a defensive perturbation to guarantee that any attack towards the input in hand will fail.
We show effective on-the-fly defensive augmentation with a robustifier network that ignores the ground truth label.
We also show how to apply $A5$ to create certifiably robust physical objects.
arXiv Detail & Related papers (2023-05-23T16:07:58Z) - Alternating Objectives Generates Stronger PGD-Based Adversarial Attacks [78.2700757742992]
Projected Gradient Descent (PGD) is one of the most effective and conceptually simple algorithms to generate such adversaries.
We experimentally verify this assertion on a synthetic-data example and by evaluating our proposed method across 25 different $ell_infty$-robust models and 3 datasets.
Our strongest adversarial attack outperforms all of the white-box components of AutoAttack ensemble.
arXiv Detail & Related papers (2022-12-15T17:44:31Z) - Unifying Gradients to Improve Real-world Robustness for Deep Networks [28.94112170725205]
We propose a real-world defense by Unifying Gradients (UniG) of different data.
UniG indicates attackers a twisted and less informative attack direction.
We implement UniG efficiently by a Hadamard product module which is plug-and-play.
arXiv Detail & Related papers (2022-08-12T11:41:56Z) - Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack [96.50202709922698]
A practical evaluation method should be convenient (i.e., parameter-free), efficient (i.e., fewer iterations) and reliable.
We propose a parameter-free Adaptive Auto Attack (A$3$) evaluation method which addresses the efficiency and reliability in a test-time-training fashion.
arXiv Detail & Related papers (2022-03-10T04:53:54Z) - Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models.
In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms.
CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z) - RayS: A Ray Searching Method for Hard-label Adversarial Attack [99.72117609513589]
We present the Ray Searching attack (RayS), which greatly improves the hard-label attack effectiveness as well as efficiency.
RayS attack can also be used as a sanity check for possible "falsely robust" models.
arXiv Detail & Related papers (2020-06-23T07:01:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.