Certifying Joint Adversarial Robustness for Model Ensembles
- URL: http://arxiv.org/abs/2004.10250v1
- Date: Tue, 21 Apr 2020 19:38:31 GMT
- Title: Certifying Joint Adversarial Robustness for Model Ensembles
- Authors: Mainuddin Ahmad Jonas, David Evans
- Abstract summary: Deep Neural Networks (DNNs) are often vulnerable to adversarial examples.
A proposed defense deploys an ensemble of models with the hope that, although the individual models may be vulnerable, an adversary will not be able to find an adversarial example that succeeds against the ensemble.
We consider the joint vulnerability of an ensemble of models, and propose a novel technique for certifying the joint robustness of ensembles.
- Score: 10.203602318836445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks (DNNs) are often vulnerable to adversarial
examples.Several proposed defenses deploy an ensemble of models with the hope
that, although the individual models may be vulnerable, an adversary will not
be able to find an adversarial example that succeeds against the ensemble.
Depending on how the ensemble is used, an attacker may need to find a single
adversarial example that succeeds against all, or a majority, of the models in
the ensemble. The effectiveness of ensemble defenses against strong adversaries
depends on the vulnerability spaces of models in the ensemble being disjoint.
We consider the joint vulnerability of an ensemble of models, and propose a
novel technique for certifying the joint robustness of ensembles, building upon
prior works on single-model robustness certification. We evaluate the
robustness of various models ensembles, including models trained using
cost-sensitive robustness to be diverse, to improve understanding of the
potential effectiveness of ensemble models as a defense against adversarial
examples.
Related papers
- Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - JAB: Joint Adversarial Prompting and Belief Augmentation [81.39548637776365]
We introduce a joint framework in which we probe and improve the robustness of a black-box target model via adversarial prompting and belief augmentation.
This framework utilizes an automated red teaming approach to probe the target model, along with a belief augmenter to generate instructions for the target model to improve its robustness to those adversarial probes.
arXiv Detail & Related papers (2023-11-16T00:35:54Z) - Robust Ensemble Morph Detection with Domain Generalization [23.026167387128933]
We learn a morph detection model with high generalization to a wide range of morphing attacks and high robustness against different adversarial attacks.
To this aim, we develop an ensemble of convolutional neural networks (CNNs) and Transformer models to benefit from their capabilities simultaneously.
Our exhaustive evaluations demonstrate that the proposed robust ensemble model generalizes to several morphing attacks and face datasets.
arXiv Detail & Related papers (2022-09-16T19:00:57Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Resisting Adversarial Attacks in Deep Neural Networks using Diverse
Decision Boundaries [12.312877365123267]
Deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify.
We develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model.
We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2022-08-18T08:19:26Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Jacobian Ensembles Improve Robustness Trade-offs to Adversarial Attacks [5.70772577110828]
We propose a novel approach, Jacobian Ensembles, to increase the robustness against UAPs.
Our results show that Jacobian Ensembles achieves previously unseen levels of accuracy and robustness.
arXiv Detail & Related papers (2022-04-19T08:04:38Z) - "What's in the box?!": Deflecting Adversarial Attacks by Randomly
Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models.
We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z) - Voting based ensemble improves robustness of defensive models [82.70303474487105]
We study whether it is possible to create an ensemble to further improve robustness.
By ensembling several state-of-the-art pre-trained defense models, our method can achieve a 59.8% robust accuracy.
arXiv Detail & Related papers (2020-11-28T00:08:45Z) - Evaluating Ensemble Robustness Against Adversarial Attacks [0.0]
Adversarial examples, which are slightly perturbed inputs generated with the aim of fooling a neural network, are known to transfer between models.
This concept of transferability poses grave security concerns as it leads to the possibility of attacking models in a black box setting.
We introduce a gradient based measure of how effectively an ensemble's constituent models collaborate to reduce the space of adversarial examples targeting the ensemble itself.
arXiv Detail & Related papers (2020-05-12T13:20:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.