Evaluating Ensemble Robustness Against Adversarial Attacks
- URL: http://arxiv.org/abs/2005.05750v1
- Date: Tue, 12 May 2020 13:20:54 GMT
- Title: Evaluating Ensemble Robustness Against Adversarial Attacks
- Authors: George Adam and Romain Speciel
- Abstract summary: Adversarial examples, which are slightly perturbed inputs generated with the aim of fooling a neural network, are known to transfer between models.
This concept of transferability poses grave security concerns as it leads to the possibility of attacking models in a black box setting.
We introduce a gradient based measure of how effectively an ensemble's constituent models collaborate to reduce the space of adversarial examples targeting the ensemble itself.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial examples, which are slightly perturbed inputs generated with the
aim of fooling a neural network, are known to transfer between models;
adversaries which are effective on one model will often fool another. This
concept of transferability poses grave security concerns as it leads to the
possibility of attacking models in a black box setting, during which the
internal parameters of the target model are unknown. In this paper, we seek to
analyze and minimize the transferability of adversaries between models within
an ensemble. To this end, we introduce a gradient based measure of how
effectively an ensemble's constituent models collaborate to reduce the space of
adversarial examples targeting the ensemble itself. Furthermore, we demonstrate
that this measure can be utilized during training as to increase an ensemble's
robustness to adversarial examples.
Related papers
- Enhancing Adversarial Robustness via Uncertainty-Aware Distributional Adversarial Training [43.766504246864045]
We propose a novel uncertainty-aware distributional adversarial training method.
Our approach achieves state-of-the-art adversarial robustness and maintains natural performance.
arXiv Detail & Related papers (2024-11-05T07:26:24Z) - JAB: Joint Adversarial Prompting and Belief Augmentation [81.39548637776365]
We introduce a joint framework in which we probe and improve the robustness of a black-box target model via adversarial prompting and belief augmentation.
This framework utilizes an automated red teaming approach to probe the target model, along with a belief augmenter to generate instructions for the target model to improve its robustness to those adversarial probes.
arXiv Detail & Related papers (2023-11-16T00:35:54Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the
Adversarial Transferability [20.255708227671573]
Black-box adversarial attacks can be transferred from one model to another.
In this work, we propose a novel ensemble attack method called the variance reduced ensemble attack.
Empirical results on the standard ImageNet demonstrate that the proposed method could boost the adversarial transferability and outperforms existing ensemble attacks significantly.
arXiv Detail & Related papers (2021-11-21T06:33:27Z) - Training Meta-Surrogate Model for Transferable Adversarial Attack [98.13178217557193]
We consider adversarial attacks to a black-box model when no queries are allowed.
In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model.
We show we can obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models.
arXiv Detail & Related papers (2021-09-05T03:27:46Z) - Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise.
Pre-processing methods may suffer from the robustness degradation effect.
A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model.
We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z) - "What's in the box?!": Deflecting Adversarial Attacks by Randomly
Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models.
We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z) - Boosting Black-Box Attack with Partially Transferred Conditional
Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs)
We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases.
Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z) - Certifying Joint Adversarial Robustness for Model Ensembles [10.203602318836445]
Deep Neural Networks (DNNs) are often vulnerable to adversarial examples.
A proposed defense deploys an ensemble of models with the hope that, although the individual models may be vulnerable, an adversary will not be able to find an adversarial example that succeeds against the ensemble.
We consider the joint vulnerability of an ensemble of models, and propose a novel technique for certifying the joint robustness of ensembles.
arXiv Detail & Related papers (2020-04-21T19:38:31Z) - Luring of transferable adversarial perturbations in the black-box
paradigm [0.0]
We present a new approach to improve the robustness of a model against black-box transfer attacks.
A removable additional neural network is included in the target model, and is designed to induce the textitluring effect.
Our deception-based method only needs to have access to the predictions of the target model and does not require a labeled data set.
arXiv Detail & Related papers (2020-04-10T06:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.