Saliency Diversified Deep Ensemble for Robustness to Adversaries
- URL: http://arxiv.org/abs/2112.03615v1
- Date: Tue, 7 Dec 2021 10:18:43 GMT
- Title: Saliency Diversified Deep Ensemble for Robustness to Adversaries
- Authors: Alex Bogun, Dimche Kostadinov, Damian Borth
- Abstract summary: This work proposes a novel diversity-promoting learning approach for the deep ensembles.
The idea is to promote saliency map diversity (SMD) on ensemble members to prevent the attacker from targeting all ensemble members at once.
We empirically show a reduced transferability between ensemble members and improved performance compared to the state-of-the-art ensemble defense.
- Score: 1.9659095632676094
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models have shown incredible performance on numerous image
recognition, classification, and reconstruction tasks. Although very appealing
and valuable due to their predictive capabilities, one common threat remains
challenging to resolve. A specifically trained attacker can introduce malicious
input perturbations to fool the network, thus causing potentially harmful
mispredictions. Moreover, these attacks can succeed when the adversary has full
access to the target model (white-box) and even when such access is limited
(black-box setting). The ensemble of models can protect against such attacks
but might be brittle under shared vulnerabilities in its members (attack
transferability). To that end, this work proposes a novel diversity-promoting
learning approach for the deep ensembles. The idea is to promote saliency map
diversity (SMD) on ensemble members to prevent the attacker from targeting all
ensemble members at once by introducing an additional term in our learning
objective. During training, this helps us minimize the alignment between model
saliencies to reduce shared member vulnerabilities and, thus, increase ensemble
robustness to adversaries. We empirically show a reduced transferability
between ensemble members and improved performance compared to the
state-of-the-art ensemble defense against medium and high strength white-box
attacks. In addition, we demonstrate that our approach combined with existing
methods outperforms state-of-the-art ensemble algorithms for defense under
white-box and black-box attacks.
Related papers
- Multi-granular Adversarial Attacks against Black-box Neural Ranking Models [111.58315434849047]
We create high-quality adversarial examples by incorporating multi-granular perturbations.
We transform the multi-granular attack into a sequential decision-making process.
Our attack method surpasses prevailing baselines in both attack effectiveness and imperceptibility.
arXiv Detail & Related papers (2024-04-02T02:08:29Z) - Understanding and Improving Ensemble Adversarial Defense [4.504026914523449]
We develop a new error theory dedicated to understanding ensemble adversarial defense.
We propose an effective approach to improve ensemble adversarial defense, named interactive global adversarial training (iGAT)
iGAT is capable of boosting their performance by increases up to 17% evaluated using CIFAR10 and CIFAR100 datasets under both white-box and black-box attacks.
arXiv Detail & Related papers (2023-10-27T20:43:29Z) - Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack [53.032801921915436]
Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars.
Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks.
We show such threats exist, even when the attacker only has access to the input/output of the model.
We propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR.
arXiv Detail & Related papers (2022-11-21T09:51:28Z) - Improving Adversarial Robustness with Self-Paced Hard-Class Pair
Reweighting [5.084323778393556]
adversarial training with untargeted attacks is one of the most recognized methods.
We find that the naturally imbalanced inter-class semantic similarity makes those hard-class pairs to become the virtual targets of each other.
We propose to upweight hard-class pair loss in model optimization, which prompts learning discriminative features from hard classes.
arXiv Detail & Related papers (2022-10-26T22:51:36Z) - Resisting Adversarial Attacks in Deep Neural Networks using Diverse
Decision Boundaries [12.312877365123267]
Deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify.
We develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model.
We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2022-08-18T08:19:26Z) - Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the
Adversarial Transferability [20.255708227671573]
Black-box adversarial attacks can be transferred from one model to another.
In this work, we propose a novel ensemble attack method called the variance reduced ensemble attack.
Empirical results on the standard ImageNet demonstrate that the proposed method could boost the adversarial transferability and outperforms existing ensemble attacks significantly.
arXiv Detail & Related papers (2021-11-21T06:33:27Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Adversarial training may be a double-edged sword [50.09831237090801]
We show that some geometric consequences of adversarial training on the decision boundary of deep networks give an edge to certain types of black-box attacks.
In particular, we define a metric called robustness gain to show that while adversarial training is an effective method to dramatically improve the robustness in white-box scenarios, it may not provide such a good robustness gain against the more realistic decision-based black-box attacks.
arXiv Detail & Related papers (2021-07-24T19:09:16Z) - Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one.
We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models.
Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z) - "What's in the box?!": Deflecting Adversarial Attacks by Randomly
Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models.
We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.