KATANA: Simple Post-Training Robustness Using Test Time Augmentations
- URL: http://arxiv.org/abs/2109.08191v1
- Date: Thu, 16 Sep 2021 19:16:00 GMT
- Title: KATANA: Simple Post-Training Robustness Using Test Time Augmentations
- Authors: Gilad Cohen, Raja Giryes
- Abstract summary: A leading defense against such attacks is adversarial training, a technique in which a DNN is trained to be robust to adversarial attacks.
We propose a new simple and easy-to-use technique, KATANA, for robustifying an existing pretrained DNN without modifying its weights.
Our strategy achieves state-of-the-art adversarial robustness on diverse attacks with minimal compromise on the natural images' classification.
- Score: 49.28906786793494
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Although Deep Neural Networks (DNNs) achieve excellent performance on many
real-world tasks, they are highly vulnerable to adversarial attacks. A leading
defense against such attacks is adversarial training, a technique in which a
DNN is trained to be robust to adversarial attacks by introducing adversarial
noise to its input. This procedure is effective but must be done during the
training phase. In this work, we propose a new simple and easy-to-use
technique, KATANA, for robustifying an existing pretrained DNN without
modifying its weights. For every image, we generate N randomized Test Time
Augmentations (TTAs) by applying diverse color, blur, noise, and geometric
transforms. Next, we utilize the DNN's logits output to train a simple random
forest classifier to predict the real class label. Our strategy achieves
state-of-the-art adversarial robustness on diverse attacks with minimal
compromise on the natural images' classification. We test KATANA also against
two adaptive white-box attacks and it shows excellent results when combined
with adversarial training. Code is available in
https://github.com/giladcohen/KATANA.
Related papers
- On adversarial training and the 1 Nearest Neighbor classifier [8.248839892711478]
We compare the performance of adversarial training to that of the simple 1 Nearest Neighbor (1NN) classifier.
Experiments with 135 different binary image classification problems taken from CIFAR10, MNIST and Fashion-MNIST.
We find that 1NN outperforms almost all of them in terms of robustness to perturbations that are only slightly different from those used during training.
arXiv Detail & Related papers (2024-04-09T13:47:37Z) - Do we need entire training data for adversarial training? [2.995087247817663]
We show that we can decrease the training time for any adversarial training algorithm by using only a subset of training data for adversarial training.
We perform adversarial training on the adversarially-prone subset and mix it with vanilla training performed on the entire dataset.
Our results show that when our method-agnostic approach is plugged into FGSM, we achieve a speedup of 3.52x on MNIST and 1.98x on the CIFAR-10 dataset with comparable robust accuracy.
arXiv Detail & Related papers (2023-03-10T23:21:05Z) - AccelAT: A Framework for Accelerating the Adversarial Training of Deep
Neural Networks through Accuracy Gradient [12.118084418840152]
Adrial training is exploited to develop a robust Deep Neural Network (DNN) model against malicious altered data.
This paper aims at accelerating the adversarial training to enable fast development of robust DNN models against adversarial attacks.
arXiv Detail & Related papers (2022-10-13T10:31:51Z) - Two Heads are Better than One: Robust Learning Meets Multi-branch Models [14.72099568017039]
We propose Branch Orthogonality adveRsarial Training (BORT) to obtain state-of-the-art performance with solely the original dataset for adversarial training.
We evaluate our approach on CIFAR-10, CIFAR-100, and SVHN against ell_infty norm-bounded perturbations of size epsilon = 8/255, respectively.
arXiv Detail & Related papers (2022-08-17T05:42:59Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Practical No-box Adversarial Attacks with Training-free Hybrid Image
Transformation [123.33816363589506]
We show the existence of a textbftraining-free adversarial perturbation under the no-box threat model.
Motivated by our observation that high-frequency component (HFC) domains in low-level features, we attack an image mainly by manipulating its frequency components.
Our method is even competitive to mainstream transfer-based black-box attacks.
arXiv Detail & Related papers (2022-03-09T09:51:00Z) - Universal Adversarial Training with Class-Wise Perturbations [78.05383266222285]
adversarial training is the most widely used method for defending against adversarial attacks.
In this work, we find that a UAP does not attack all classes equally.
We improve the SOTA UAT by proposing to utilize class-wise UAPs during adversarial training.
arXiv Detail & Related papers (2021-04-07T09:05:49Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Evaluating a Simple Retraining Strategy as a Defense Against Adversarial
Attacks [17.709146615433458]
We show how simple algorithms like KNN can be used to determine the labels of the adversarial images needed for retraining.
We present the results on two standard datasets namely, CIFAR-10 and TinyImageNet.
arXiv Detail & Related papers (2020-07-20T07:49:33Z) - Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood
Ensemble [163.3333439344695]
Dirichlet Neighborhood Ensemble (DNE) is a randomized smoothing method for training a robust model to defense substitution-based attacks.
DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data.
We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
arXiv Detail & Related papers (2020-06-20T18:01:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.