Towards Alternative Techniques for Improving Adversarial Robustness:
Analysis of Adversarial Training at a Spectrum of Perturbations
- URL: http://arxiv.org/abs/2206.06496v1
- Date: Mon, 13 Jun 2022 22:01:21 GMT
- Title: Towards Alternative Techniques for Improving Adversarial Robustness:
Analysis of Adversarial Training at a Spectrum of Perturbations
- Authors: Kaustubh Sridhar, Souradeep Dutta, Ramneet Kaur, James Weimer, Oleg
Sokolsky, Insup Lee
- Abstract summary: Adversarial training (AT) and its variants have spearheaded progress in improving neural network robustness to adversarial perturbations.
We focus on models, trained on a spectrum of $epsilon$ values.
We identify alternative improvements to AT that otherwise wouldn't have been apparent at a single $epsilon$.
- Score: 5.18694590238069
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training (AT) and its variants have spearheaded progress in
improving neural network robustness to adversarial perturbations and common
corruptions in the last few years. Algorithm design of AT and its variants are
focused on training models at a specified perturbation strength $\epsilon$ and
only using the feedback from the performance of that $\epsilon$-robust model to
improve the algorithm. In this work, we focus on models, trained on a spectrum
of $\epsilon$ values. We analyze three perspectives: model performance,
intermediate feature precision and convolution filter sensitivity. In each, we
identify alternative improvements to AT that otherwise wouldn't have been
apparent at a single $\epsilon$. Specifically, we find that for a PGD attack at
some strength $\delta$, there is an AT model at some slightly larger strength
$\epsilon$, but no greater, that generalizes best to it. Hence, we propose
overdesigning for robustness where we suggest training models at an $\epsilon$
just above $\delta$. Second, we observe (across various $\epsilon$ values) that
robustness is highly sensitive to the precision of intermediate features and
particularly those after the first and second layer. Thus, we propose adding a
simple quantization to defenses that improves accuracy on seen and unseen
adaptive attacks. Third, we analyze convolution filters of each layer of models
at increasing $\epsilon$ and notice that those of the first and second layer
may be solely responsible for amplifying input perturbations. We present our
findings and demonstrate our techniques through experiments with ResNet and
WideResNet models on the CIFAR-10 and CIFAR-10-C datasets.
Related papers
- Class-Conditioned Transformation for Enhanced Robust Image Classification [19.738635819545554]
We propose a novel test-time threat model algorithm that enhances Adrial-versa-Trained (AT) models.
Our method operates through COnditional image transformation and DIstance-based Prediction (CODIP)
The proposed method achieves state-of-the-art results demonstrated through extensive experiments on various models, AT methods, datasets, and attack types.
arXiv Detail & Related papers (2023-03-27T17:28:20Z) - Differentially Private Image Classification from Features [53.75086935617644]
Leveraging transfer learning has been shown to be an effective strategy for training large models with Differential Privacy.
Recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP.
arXiv Detail & Related papers (2022-11-24T04:04:20Z) - Two Heads are Better than One: Robust Learning Meets Multi-branch Models [14.72099568017039]
We propose Branch Orthogonality adveRsarial Training (BORT) to obtain state-of-the-art performance with solely the original dataset for adversarial training.
We evaluate our approach on CIFAR-10, CIFAR-100, and SVHN against ell_infty norm-bounded perturbations of size epsilon = 8/255, respectively.
arXiv Detail & Related papers (2022-08-17T05:42:59Z) - Removing Batch Normalization Boosts Adversarial Training [83.08844497295148]
Adversarial training (AT) defends deep neural networks against adversarial attacks.
A major bottleneck is the widely used batch normalization (BN), which struggles to model the different statistics of clean and adversarial training samples in AT.
Our normalizer-free robust training (NoFrost) method extends recent advances in normalizer-free networks to AT.
arXiv Detail & Related papers (2022-07-04T01:39:37Z) - Data Augmentation Can Improve Robustness [21.485435979018256]
Adrial training suffers from robust overfitting, a phenomenon where the robust test accuracy starts to decrease during training.
We demonstrate that, when combined with model weight averaging, data augmentation can significantly boost robust accuracy.
In particular, against $ell_infty$ norm-bounded perturbations of size $epsilon = 8/255$, our model reaches 60.07% robust accuracy without using any external data.
arXiv Detail & Related papers (2021-11-09T18:57:00Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Adversarial robustness against multiple $l_p$-threat models at the price
of one and how to quickly fine-tune robust models to another threat model [79.05253587566197]
Adrial training (AT) in order to achieve adversarial robustness wrt single $l_p$-threat models has been discussed extensively.
In this paper we develop a simple and efficient training scheme to achieve adversarial robustness against the union of $l_p$-threat models.
arXiv Detail & Related papers (2021-05-26T12:20:47Z) - A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via
Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy.
Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z) - Understanding Frank-Wolfe Adversarial Training [1.2183405753834557]
Adversarial Training (AT) is a technique that approximately solves a robust optimization problem to minimize the worst-case loss.
A Frank-Wolfe adversarial training approach is presented and is shown to provide competitive level of robustness as PGD-AT.
arXiv Detail & Related papers (2020-12-22T21:36:52Z) - Improving Robustness and Generality of NLP Models Using Disentangled
Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$.
We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning.
We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z) - Towards Deep Learning Models Resistant to Large Perturbations [0.0]
Adversarial robustness has proven to be a required property of machine learning algorithms.
We show that the well-established algorithm called "adversarial training" fails to train a deep neural network given a large, but reasonable, perturbation magnitude.
arXiv Detail & Related papers (2020-03-30T12:03:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.