Improving Transformation-based Defenses against Adversarial Examples
with First-order Perturbations
- URL: http://arxiv.org/abs/2103.04565v3
- Date: Sun, 28 Jan 2024 23:42:12 GMT
- Title: Improving Transformation-based Defenses against Adversarial Examples
with First-order Perturbations
- Authors: Haimin Zhang, Min Xu
- Abstract summary: Studies show that neural networks are susceptible to adversarial attacks.
This exposes a potential threat to neural network-based intelligent systems.
We propose a method for counteracting adversarial perturbations to improve adversarial robustness.
- Score: 16.346349209014182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks have been successfully applied in various machine
learning tasks. However, studies show that neural networks are susceptible to
adversarial attacks. This exposes a potential threat to neural network-based
intelligent systems. We observe that the probability of the correct result
outputted by the neural network increases by applying small first-order
perturbations generated for non-predicted class labels to adversarial examples.
Based on this observation, we propose a method for counteracting adversarial
perturbations to improve adversarial robustness. In the proposed method, we
randomly select a number of class labels and generate small first-order
perturbations for these selected labels. The generated perturbations are added
together and then clamped onto a specified space. The obtained perturbation is
finally added to the adversarial example to counteract the adversarial
perturbation contained in the example. The proposed method is applied at
inference time and does not require retraining or finetuning the model. We
experimentally validate the proposed method on CIFAR-10 and CIFAR-100. The
results demonstrate that our method effectively improves the defense
performance of several transformation-based defense methods, especially against
strong adversarial examples generated using more iterations.
Related papers
- Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled.
Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z) - Wasserstein distributional robustness of neural networks [9.79503506460041]
Deep neural networks are known to be vulnerable to adversarial attacks (AA)
For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified.
We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions.
arXiv Detail & Related papers (2023-06-16T13:41:24Z) - Hessian-Free Second-Order Adversarial Examples for Adversarial Learning [6.835470949075655]
Adversarial learning with elaborately designed adversarial examples is one of the most effective methods to defend against such an attack.
Most existing adversarial examples generation methods are based on first-order gradients, which can hardly further improve models' robustness.
We propose an approximation method through transforming the problem into an optimization in the Krylov subspace, which remarkably reduce the computational complexity to speed up the training procedure.
arXiv Detail & Related papers (2022-07-04T13:29:27Z) - Block-Sparse Adversarial Attack to Fool Transformer-Based Text
Classifiers [49.50163349643615]
In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers.
Experimental results demonstrate that, while our adversarial attack maintains the semantics of the sentence, it can reduce the accuracy of GPT-2 to less than 5%.
arXiv Detail & Related papers (2022-03-11T14:37:41Z) - Efficient and Robust Classification for Sparse Attacks [34.48667992227529]
We consider perturbations bounded by the $ell$--norm, which have been shown as effective attacks in the domains of image-recognition, natural language processing, and malware-detection.
We propose a novel defense method that consists of "truncation" and "adrial training"
Motivated by the insights we obtain, we extend these components to neural network classifiers.
arXiv Detail & Related papers (2022-01-23T21:18:17Z) - Searching for an Effective Defender: Benchmarking Defense against
Adversarial Word Substitution [83.84968082791444]
Deep neural networks are vulnerable to intentionally crafted adversarial examples.
Various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models.
arXiv Detail & Related papers (2021-08-29T08:11:36Z) - GradDiv: Adversarial Robustness of Randomized Neural Networks via
Gradient Diversity Regularization [3.9157051137215504]
We investigate the effect of adversarial attacks using proxy gradients on randomized neural networks.
We show that proxy gradients are less effective when the gradients are more scattered.
We propose Gradient Diversity (GradDiv) regularizations that minimize the concentration of the gradients to build a robust neural network.
arXiv Detail & Related papers (2021-07-06T06:57:40Z) - Adversarial Examples Detection with Bayesian Neural Network [57.185482121807716]
We propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors.
We propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example detection.
arXiv Detail & Related papers (2021-05-18T15:51:24Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z) - Class-Aware Domain Adaptation for Improving Adversarial Robustness [27.24720754239852]
adversarial training has been proposed to train networks by injecting adversarial examples into the training data.
We propose a novel Class-Aware Domain Adaptation (CADA) method for adversarial defense without directly applying adversarial training.
arXiv Detail & Related papers (2020-05-10T03:45:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.