Pruning Adversarially Robust Neural Networks without Adversarial
Examples
- URL: http://arxiv.org/abs/2210.04311v1
- Date: Sun, 9 Oct 2022 17:48:50 GMT
- Title: Pruning Adversarially Robust Neural Networks without Adversarial
Examples
- Authors: Tong Jian, Zifeng Wang, Yanzhi Wang, Jennifer Dy, Stratis Ioannidis
- Abstract summary: We propose a novel framework to prune a robust neural network while maintaining adversarial robustness.
We leverage concurrent self-distillation and pruning to preserve knowledge in the original model as well as regularizing the pruned model via the Hilbert-Schmidt Information Bottleneck.
- Score: 27.952904247130263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial pruning compresses models while preserving robustness. Current
methods require access to adversarial examples during pruning. This
significantly hampers training efficiency. Moreover, as new adversarial attacks
and training methods develop at a rapid rate, adversarial pruning methods need
to be modified accordingly to keep up. In this work, we propose a novel
framework to prune a previously trained robust neural network while maintaining
adversarial robustness, without further generating adversarial examples. We
leverage concurrent self-distillation and pruning to preserve knowledge in the
original model as well as regularizing the pruned model via the Hilbert-Schmidt
Information Bottleneck. We comprehensively evaluate our proposed framework and
show its superior performance in terms of both adversarial robustness and
efficiency when pruning architectures trained on the MNIST, CIFAR-10, and
CIFAR-100 datasets against five state-of-the-art attacks. Code is available at
https://github.com/neu-spiral/PwoA/.
Related papers
- Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Two Heads are Better than One: Robust Learning Meets Multi-branch Models [14.72099568017039]
We propose Branch Orthogonality adveRsarial Training (BORT) to obtain state-of-the-art performance with solely the original dataset for adversarial training.
We evaluate our approach on CIFAR-10, CIFAR-100, and SVHN against ell_infty norm-bounded perturbations of size epsilon = 8/255, respectively.
arXiv Detail & Related papers (2022-08-17T05:42:59Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Soft Adversarial Training Can Retain Natural Accuracy [0.0]
We propose a training framework that can retain natural accuracy without sacrificing robustness in a constrained setting.
Our framework specifically targets moderately critical applications which require a reasonable balance between robustness and accuracy.
arXiv Detail & Related papers (2022-06-04T04:13:25Z) - Finding Dynamics Preserving Adversarial Winning Tickets [11.05616199881368]
Pruning methods have been considered in adversarial context to reduce model capacity and improve adversarial robustness simultaneously in training.
Existing adversarial pruning methods generally mimic the classical pruning methods for natural training, which follow the three-stage 'training-pruning-fine-tuning' pipelines.
We show empirical evidences that AWT preserves the dynamics of adversarial training and achieve equal performance as dense adversarial training.
arXiv Detail & Related papers (2022-02-14T05:34:24Z) - Robust Binary Models by Pruning Randomly-initialized Networks [57.03100916030444]
We propose ways to obtain robust models against adversarial attacks from randomly-d binary networks.
We learn the structure of the robust model by pruning a randomly-d binary network.
Our method confirms the strong lottery ticket hypothesis in the presence of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T00:05:08Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples.
We propose a new framework called SPROUT, self-progressing robust training.
Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z) - REGroup: Rank-aggregating Ensemble of Generative Classifiers for Robust
Predictions [6.0162772063289784]
Defense strategies that adopt adversarial training or random input transformations typically require retraining or fine-tuning the model to achieve reasonable performance.
We find that we can learn a generative classifier by statistically characterizing the neural response of an intermediate layer to clean training samples.
Our proposed approach uses a subset of the clean training data and a pre-trained model, and yet is agnostic to network architectures or the adversarial attack generation method.
arXiv Detail & Related papers (2020-06-18T17:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.