Related papers: Transferring Adversarial Robustness Through Robust Representation Matching

Transferring Adversarial Robustness Through Robust Representation Matching

URL: http://arxiv.org/abs/2202.09994v1
Date: Mon, 21 Feb 2022 05:15:40 GMT
Title: Transferring Adversarial Robustness Through Robust Representation Matching
Authors: Pratik Vaishnavi, Kevin Eykholt, Amir Rahmati
Abstract summary: Adrial training is one of the few known defenses able to reliably withstand such attacks against neural networks. We propose Robust Representation Matching (RRM), a low-cost method to transfer the robustness of an adversarially trained model to a new model. RRM is superior with respect to both model performance and adversarial training time.
Score: 3.5934248574481717
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the widespread use of machine learning, concerns over its security and reliability have become prevalent. As such, many have developed defenses to harden neural networks against adversarial examples, imperceptibly perturbed inputs that are reliably misclassified. Adversarial training in which adversarial examples are generated and used during training is one of the few known defenses able to reliably withstand such attacks against neural networks. However, adversarial training imposes a significant training overhead and scales poorly with model complexity and input dimension. In this paper, we propose Robust Representation Matching (RRM), a low-cost method to transfer the robustness of an adversarially trained model to a new model being trained for the same task irrespective of architectural differences. Inspired by student-teacher learning, our method introduces a novel training loss that encourages the student to learn the teacher's robust representations. Compared to prior works, RRM is superior with respect to both model performance and adversarial training time. On CIFAR-10, RRM trains a robust model $\sim 1.8\times$ faster than the state-of-the-art. Furthermore, RRM remains effective on higher-dimensional datasets. On Restricted-ImageNet, RRM trains a ResNet50 model $\sim 18\times$ faster than standard adversarial training.

Related papers

Adversarial Training of Reward Models [74.17196154247964]
We introduce Adv-RM, a novel adversarial training framework that automatically identifies adversarial examples. By leveraging reinforcement learning, Adv-RM trains a policy to expose vulnerabilities in large state-of-the-art reward models. We demonstrate that Adv-RM significantly outperforms conventional reward training.
arXiv Detail & Related papers (2025-04-08T15:38:25Z)
Pruning Adversarially Robust Neural Networks without Adversarial Examples [27.952904247130263]
We propose a novel framework to prune a robust neural network while maintaining adversarial robustness. We leverage concurrent self-distillation and pruning to preserve knowledge in the original model as well as regularizing the pruned model via the Hilbert-Schmidt Information Bottleneck.
arXiv Detail & Related papers (2022-10-09T17:48:50Z)
Two Heads are Better than One: Robust Learning Meets Multi-branch Models [14.72099568017039]
We propose Branch Orthogonality adveRsarial Training (BORT) to obtain state-of-the-art performance with solely the original dataset for adversarial training. We evaluate our approach on CIFAR-10, CIFAR-100, and SVHN against ell_infty norm-bounded perturbations of size epsilon = 8/255, respectively.
arXiv Detail & Related papers (2022-08-17T05:42:59Z)
Sparsity Winning Twice: Better Robust Generalization from More Efficient Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity. We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting. Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z)
Mutual Adversarial Training: Learning together is better than going alone [82.78852509965547]
We study how interactions among models affect robustness via knowledge distillation. We propose mutual adversarial training (MAT) in which multiple models are trained together. MAT can effectively improve model robustness and outperform state-of-the-art methods under white-box attacks.
arXiv Detail & Related papers (2021-12-09T15:59:42Z)
$\ell_\infty$-Robustness and Beyond: Unleashing Efficient Adversarial Training [11.241749205970253]
We show how selecting a small subset of training data provides a more principled approach towards reducing the time complexity of robust training. Our approach speeds up adversarial training by 2-3 times, while experiencing a small reduction in the clean and robust accuracy.
arXiv Detail & Related papers (2021-12-01T09:55:01Z)
A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy. Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z)
Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples. We propose a new framework called SPROUT, self-progressing robust training. Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z)
To be Robust or to be Fair: Towards Fairness in Adversarial Training [83.42241071662897]
We find that adversarial training algorithms tend to introduce severe disparity of accuracy and robustness between different groups of data. We propose a Fair-Robust-Learning (FRL) framework to mitigate this unfairness problem when doing adversarial defenses.
arXiv Detail & Related papers (2020-10-13T02:21:54Z)
Adversarial Training with Stochastic Weight Average [4.633908654744751]
Adrial training deep neural networks often experience serious overfitting problem. In traditional machine learning, one way to relieve overfitting from the lack of data is to use ensemble methods. In this paper, we propose adversarial training with weight average (SWA) While performing adversarial training, we aggregate the temporal weight states in the trajectory of training.
arXiv Detail & Related papers (2020-09-21T04:47:20Z)
Fast is better than free: Revisiting adversarial training [86.11788847990783]
We show that it is possible to train empirically robust models using a much weaker and cheaper adversary. We identify a failure mode referred to as "catastrophic overfitting" which may have caused previous attempts to use FGSM adversarial training to fail.
arXiv Detail & Related papers (2020-01-12T20:30:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.