Minimax Defense against Gradient-based Adversarial Attacks
- URL: http://arxiv.org/abs/2002.01256v1
- Date: Tue, 4 Feb 2020 12:33:13 GMT
- Title: Minimax Defense against Gradient-based Adversarial Attacks
- Authors: Blerta Lindqvist, Rauf Izmailov
- Abstract summary: We introduce a novel approach that uses minimax optimization to foil gradient-based adversarial attacks.
Our minimax defense achieves 98.07% (MNIST-default 98.93%), 73.90% (CIFAR-10-default 83.14%) and 94.54% (TRAFFIC-default 96.97%)
Our Minimax adversarial approach presents a significant shift in defense strategy for neural network classifiers.
- Score: 2.4403071643841243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art adversarial attacks are aimed at neural network classifiers.
By default, neural networks use gradient descent to minimize their loss
function. The gradient of a classifier's loss function is used by
gradient-based adversarial attacks to generate adversarially perturbed images.
We pose the question whether another type of optimization could give neural
network classifiers an edge. Here, we introduce a novel approach that uses
minimax optimization to foil gradient-based adversarial attacks. Our minimax
classifier is the discriminator of a generative adversarial network (GAN) that
plays a minimax game with the GAN generator. In addition, our GAN generator
projects all points onto a manifold that is different from the original
manifold since the original manifold might be the cause of adversarial attacks.
To measure the performance of our minimax defense, we use adversarial attacks -
Carlini Wagner (CW), DeepFool, Fast Gradient Sign Method (FGSM) - on three
datasets: MNIST, CIFAR-10 and German Traffic Sign (TRAFFIC). Against CW
attacks, our minimax defense achieves 98.07% (MNIST-default 98.93%), 73.90%
(CIFAR-10-default 83.14%) and 94.54% (TRAFFIC-default 96.97%). Against DeepFool
attacks, our minimax defense achieves 98.87% (MNIST), 76.61% (CIFAR-10) and
94.57% (TRAFFIC). Against FGSM attacks, we achieve 97.01% (MNIST), 76.79%
(CIFAR-10) and 81.41% (TRAFFIC). Our Minimax adversarial approach presents a
significant shift in defense strategy for neural network classifiers.
Related papers
- Wasserstein distributional robustness of neural networks [9.79503506460041]
Deep neural networks are known to be vulnerable to adversarial attacks (AA)
For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified.
We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions.
arXiv Detail & Related papers (2023-06-16T13:41:24Z) - Detection and Mitigation of Byzantine Attacks in Distributed Training [24.951227624475443]
An abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference.
Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients.
In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities.
arXiv Detail & Related papers (2022-08-17T05:49:52Z) - Constrained Gradient Descent: A Powerful and Principled Evasion Attack
Against Neural Networks [19.443306494201334]
We introduce several innovations that make white-box targeted attacks follow the intuition of the attacker's goal.
First, we propose a new loss function that explicitly captures the goal of targeted attacks.
Second, we propose a new attack method that uses a further developed version of our loss function capturing both the misclassification objective and the $L_infty$ distance limit.
arXiv Detail & Related papers (2021-12-28T17:36:58Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z) - Sparse and Imperceptible Adversarial Attack via a Homotopy Algorithm [93.80082636284922]
Sparse adversarial attacks can fool deep networks (DNNs) by only perturbing a few pixels.
Recent efforts combine it with another l_infty perturbation on magnitudes.
We propose a homotopy algorithm to tackle the sparsity and neural perturbation framework.
arXiv Detail & Related papers (2021-06-10T20:11:36Z) - PDPGD: Primal-Dual Proximal Gradient Descent Adversarial Attack [92.94132883915876]
State-of-the-art deep neural networks are sensitive to small input perturbations.
Many defence methods have been proposed that attempt to improve robustness to adversarial noise.
evaluating adversarial robustness has proven to be extremely challenging.
arXiv Detail & Related papers (2021-06-03T01:45:48Z) - Transferable Sparse Adversarial Attack [62.134905824604104]
We introduce a generator architecture to alleviate the overfitting issue and thus efficiently craft transferable sparse adversarial examples.
Our method achieves superior inference speed, 700$times$ faster than other optimization-based methods.
arXiv Detail & Related papers (2021-05-31T06:44:58Z) - GreedyFool: Distortion-Aware Sparse Adversarial Attack [138.55076781355206]
Modern deep neural networks (DNNs) are vulnerable to adversarial samples.
Sparse adversarial samples can fool the target model by only perturbing a few pixels.
We propose a novel two-stage distortion-aware greedy-based method dubbed as "GreedyFool"
arXiv Detail & Related papers (2020-10-26T17:59:07Z) - Patch-wise Attack for Fooling Deep Neural Network [153.59832333877543]
We propose a patch-wise iterative algorithm -- a black-box attack towards mainstream normally trained and defense models.
We significantly improve the success rate by 9.2% for defense models and 3.7% for normally trained models on average.
arXiv Detail & Related papers (2020-07-14T01:50:22Z) - Confusing and Detecting ML Adversarial Attacks with Injected Attractors [13.939695351344538]
A machine learning adversarial attack finds adversarial samples of a victim model $mathcal M$ by following the gradient of some attack objective functions.
We take the proactive approach that modifies those functions with the goal of misleading the attacks to some local minimals.
We observe that decoders of watermarking schemes exhibit properties of attractors and give a generic method that injects attractors into the victim model.
arXiv Detail & Related papers (2020-03-05T16:02:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.