Related papers: "What's in the box?!": Deflecting Adversarial Attacks by Randomly Deploying Adversarially-Disjoint Models

"What's in the box?!": Deflecting Adversarial Attacks by Randomly Deploying Adversarially-Disjoint Models

URL: http://arxiv.org/abs/2102.05104v1
Date: Tue, 9 Feb 2021 20:07:13 GMT
Title: "What's in the box?!": Deflecting Adversarial Attacks by Randomly Deploying Adversarially-Disjoint Models
Authors: Sahar Abdelnabi and Mario Fritz
Abstract summary: adversarial examples have been long considered a real threat to machine learning models. We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
Score: 71.91835408379602
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning models are now widely deployed in real-world applications. However, the existence of adversarial examples has been long considered a real threat to such models. While numerous defenses aiming to improve the robustness have been proposed, many have been shown ineffective. As these vulnerabilities are still nowhere near being eliminated, we propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models. Instead of training a single partially-robust model, one could train a set of same-functionality, yet, adversarially-disjoint models with minimal in-between attack transferability. These models could then be randomly and individually deployed, such that accessing one of them minimally affects the others. Our experiments on CIFAR-10 and a wide range of attacks show that we achieve a significantly lower attack transferability across our disjoint models compared to a baseline of ensemble diversity. In addition, compared to an adversarially trained set, we achieve a higher average robust accuracy while maintaining the accuracy of clean examples.

Related papers

MultiRobustBench: Benchmarking Robustness Against Multiple Attacks [86.70417016955459]
We present the first unified framework for considering multiple attacks against machine learning (ML) models. Our framework is able to model different levels of learner's knowledge about the test-time adversary. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types.
arXiv Detail & Related papers (2023-02-21T20:26:39Z)
Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability [20.255708227671573]
Black-box adversarial attacks can be transferred from one model to another. In this work, we propose a novel ensemble attack method called the variance reduced ensemble attack. Empirical results on the standard ImageNet demonstrate that the proposed method could boost the adversarial transferability and outperforms existing ensemble attacks significantly.
arXiv Detail & Related papers (2021-11-21T06:33:27Z)
Training Meta-Surrogate Model for Transferable Adversarial Attack [98.13178217557193]
We consider adversarial attacks to a black-box model when no queries are allowed. In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model. We show we can obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models.
arXiv Detail & Related papers (2021-09-05T03:27:46Z)
Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications. We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths. Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z)
Optimal Transport as a Defense Against Adversarial Attacks [4.6193503399184275]
Adversarial attacks can find a human-imperceptible perturbation for a given image that will mislead a trained model. Previous work aimed to align original and adversarial image representations in the same way as domain adaptation to improve robustness. We propose to use a loss between distributions that faithfully reflect the ground distance. This leads to SAT (Sinkhorn Adversarial Training), a more robust defense against adversarial attacks.
arXiv Detail & Related papers (2021-02-05T13:24:36Z)
Adversarial Learning with Cost-Sensitive Classes [7.6596177815175475]
It is necessary to improve the performance of some special classes or to particularly protect them from attacks in adversarial learning. This paper proposes a framework combining cost-sensitive classification and adversarial learning together to train a model that can distinguish between protected and unprotected classes.
arXiv Detail & Related papers (2021-01-29T03:15:40Z)
Voting based ensemble improves robustness of defensive models [82.70303474487105]
We study whether it is possible to create an ensemble to further improve robustness. By ensembling several state-of-the-art pre-trained defense models, our method can achieve a 59.8% robust accuracy.
arXiv Detail & Related papers (2020-11-28T00:08:45Z)
Are Adversarial Examples Created Equal? A Learnable Weighted Minimax Risk for Robustness under Non-uniform Attacks [70.11599738647963]
Adversarial Training is one of the few defenses that withstand strong attacks. Traditional defense mechanisms assume a uniform attack over the examples according to the underlying data distribution. We present a weighted minimax risk optimization that defends against non-uniform attacks.
arXiv Detail & Related papers (2020-10-24T21:20:35Z)
DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles [20.46399318111058]
Adversarial attacks can mislead CNN models with small perturbations, which can effectively transfer between different models trained on the same dataset. We propose DVERGE, which isolates the adversarial vulnerability in each sub-model by distilling non-robust features. The novel diversity metric and training procedure enables DVERGE to achieve higher robustness against transfer attacks.
arXiv Detail & Related papers (2020-09-30T14:57:35Z)
Certifying Joint Adversarial Robustness for Model Ensembles [10.203602318836445]
Deep Neural Networks (DNNs) are often vulnerable to adversarial examples. A proposed defense deploys an ensemble of models with the hope that, although the individual models may be vulnerable, an adversary will not be able to find an adversarial example that succeeds against the ensemble. We consider the joint vulnerability of an ensemble of models, and propose a novel technique for certifying the joint robustness of ensembles.
arXiv Detail & Related papers (2020-04-21T19:38:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.