MORA: Improving Ensemble Robustness Evaluation with Model-Reweighing
Attack
- URL: http://arxiv.org/abs/2211.08008v1
- Date: Tue, 15 Nov 2022 09:45:32 GMT
- Title: MORA: Improving Ensemble Robustness Evaluation with Model-Reweighing
Attack
- Authors: Yunrui Yu, Xitong Gao, Cheng-Zhong Xu
- Abstract summary: Adversarial attacks can deceive neural networks by adding tiny perturbations to their input data.
We show that adversarial attack strategies cannot reliably evaluate ensemble defenses, sizeably overestimating their robustness.
We introduce MORA, a model-reweighing attack to steer adversarial example synthesis by reweighing the importance of sub-model gradients.
- Score: 26.37741124166643
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Adversarial attacks can deceive neural networks by adding tiny perturbations
to their input data. Ensemble defenses, which are trained to minimize attack
transferability among sub-models, offer a promising research direction to
improve robustness against such attacks while maintaining a high accuracy on
natural inputs. We discover, however, that recent state-of-the-art (SOTA)
adversarial attack strategies cannot reliably evaluate ensemble defenses,
sizeably overestimating their robustness. This paper identifies the two factors
that contribute to this behavior. First, these defenses form ensembles that are
notably difficult for existing gradient-based method to attack, due to gradient
obfuscation. Second, ensemble defenses diversify sub-model gradients,
presenting a challenge to defeat all sub-models simultaneously, simply summing
their contributions may counteract the overall attack objective; yet, we
observe that ensemble may still be fooled despite most sub-models being
correct. We therefore introduce MORA, a model-reweighing attack to steer
adversarial example synthesis by reweighing the importance of sub-model
gradients. MORA finds that recent ensemble defenses all exhibit varying degrees
of overestimated robustness. Comparing it against recent SOTA white-box
attacks, it can converge orders of magnitude faster while achieving higher
attack success rates across all ensemble models examined with three different
ensemble modes (i.e., ensembling by either softmax, voting or logits). In
particular, most ensemble defenses exhibit near or exactly 0% robustness
against MORA with $\ell^\infty$ perturbation within 0.02 on CIFAR-10, and 0.01
on CIFAR-100. We make MORA open source with reproducible results and
pre-trained models; and provide a leaderboard of ensemble defenses under
various attack strategies.
Related papers
- Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust [65.95797963483729]
Ensemble everything everywhere is a defense to adversarial examples.
We show that this defense is not robust to adversarial attack.
We then use standard adaptive attack techniques to reduce the defense's robust accuracy.
arXiv Detail & Related papers (2024-11-22T10:17:32Z) - From Attack to Defense: Insights into Deep Learning Security Measures in Black-Box Settings [1.8006345220416338]
adversarial samples pose a serious threat that can cause the model to misbehave and compromise the performance of such applications.
Addressing the robustness of Deep Learning models has become crucial to understanding and defending against adversarial attacks.
Our research focuses on black-box attacks such as SimBA, HopSkipJump, MGAAttack, and boundary attacks, as well as preprocessor-based defensive mechanisms.
arXiv Detail & Related papers (2024-05-03T09:40:47Z) - Interpolated Joint Space Adversarial Training for Robust and
Generalizable Defenses [82.3052187788609]
Adversarial training (AT) is considered to be one of the most reliable defenses against adversarial attacks.
Recent works show generalization improvement with adversarial samples under novel threat models.
We propose a novel threat model called Joint Space Threat Model (JSTM)
Under JSTM, we develop novel adversarial attacks and defenses.
arXiv Detail & Related papers (2021-12-12T21:08:14Z) - Mutual Adversarial Training: Learning together is better than going
alone [82.78852509965547]
We study how interactions among models affect robustness via knowledge distillation.
We propose mutual adversarial training (MAT) in which multiple models are trained together.
MAT can effectively improve model robustness and outperform state-of-the-art methods under white-box attacks.
arXiv Detail & Related papers (2021-12-09T15:59:42Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Lagrangian Objective Function Leads to Improved Unforeseen Attack
Generalization in Adversarial Training [0.0]
Adversarial training (AT) has been shown effective to reach a robust model against the attack that is used during training.
We propose a simple modification to the AT that mitigates the mentioned issue.
We show that our attack is faster than other attack schemes that are designed for unseen attack generalization.
arXiv Detail & Related papers (2021-03-29T07:23:46Z) - "What's in the box?!": Deflecting Adversarial Attacks by Randomly
Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models.
We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z) - Voting based ensemble improves robustness of defensive models [82.70303474487105]
We study whether it is possible to create an ensemble to further improve robustness.
By ensembling several state-of-the-art pre-trained defense models, our method can achieve a 59.8% robust accuracy.
arXiv Detail & Related papers (2020-11-28T00:08:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.