Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies
- URL: http://arxiv.org/abs/2404.09349v2
- Date: Wed, 10 Jul 2024 17:32:29 GMT
- Title: Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies
- Authors: Brian R. Bartoldson, James Diffenderfer, Konstantinos Parasyris, Bhavya Kailkhura,
- Abstract summary: We analyze how model size, dataset size, and synthetic data quality affect robustness by developing the first scaling laws for adversarial training.
Our scaling laws reveal inefficiencies in prior art and provide actionable feedback to advance the field.
- Score: 19.100334346597982
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper revisits the simple, long-studied, yet still unsolved problem of making image classifiers robust to imperceptible perturbations. Taking CIFAR10 as an example, SOTA clean accuracy is about $100$%, but SOTA robustness to $\ell_{\infty}$-norm bounded perturbations barely exceeds $70$%. To understand this gap, we analyze how model size, dataset size, and synthetic data quality affect robustness by developing the first scaling laws for adversarial training. Our scaling laws reveal inefficiencies in prior art and provide actionable feedback to advance the field. For instance, we discovered that SOTA methods diverge notably from compute-optimal setups, using excess compute for their level of robustness. Leveraging a compute-efficient setup, we surpass the prior SOTA with $20$% ($70$%) fewer training (inference) FLOPs. We trained various compute-efficient models, with our best achieving $74$% AutoAttack accuracy ($+3$% gain). However, our scaling laws also predict robustness slowly grows then plateaus at $90$%: dwarfing our new SOTA by scaling is impractical, and perfect robustness is impossible. To better understand this predicted limit, we carry out a small-scale human evaluation on the AutoAttack data that fools our top-performing model. Concerningly, we estimate that human performance also plateaus near $90$%, which we show to be attributable to $\ell_{\infty}$-constrained attacks' generation of invalid images not consistent with their original labels. Having characterized limiting roadblocks, we outline promising paths for future research.
Related papers
- Language models scale reliably with over-training and on downstream tasks [121.69867718185125]
Scaling laws are useful guides for derisking expensive training runs.
However, there remain gaps between current studies and how language models are trained.
In contrast, scaling laws mostly predict loss on inference, but models are usually compared on downstream task performance.
arXiv Detail & Related papers (2024-03-13T13:54:00Z) - Not All Models Are Equal: Predicting Model Transferability in a
Self-challenging Fisher Space [51.62131362670815]
This paper addresses the problem of ranking the pre-trained deep neural networks and screening the most transferable ones for downstream tasks.
It proposes a new transferability metric called textbfSelf-challenging textbfFisher textbfDiscriminant textbfAnalysis (textbfSFDA)
arXiv Detail & Related papers (2022-07-07T01:33:25Z) - Towards Alternative Techniques for Improving Adversarial Robustness:
Analysis of Adversarial Training at a Spectrum of Perturbations [5.18694590238069]
Adversarial training (AT) and its variants have spearheaded progress in improving neural network robustness to adversarial perturbations.
We focus on models, trained on a spectrum of $epsilon$ values.
We identify alternative improvements to AT that otherwise wouldn't have been apparent at a single $epsilon$.
arXiv Detail & Related papers (2022-06-13T22:01:21Z) - Simple Baselines for Image Restoration [79.48718779396971]
We propose a simple baseline that exceeds the state-of-the-art (SOTA) methods and is computationally efficient.
We derive a Activation Free Network, namely NAFNet, from the baseline.
SOTA results are achieved on various challenging benchmarks, e.g. 33.69 dB PSNR on GoPro (for image deblurring), exceeding the previous SOTA 0.38 dB with only 8.4% of its computational costs; 40.30 dB PSNR on SIDD (for image denoising), exceeding the previous SOTA 0.28 dB with less than half of its computational costs.
arXiv Detail & Related papers (2022-04-10T12:48:38Z) - Data Augmentation Can Improve Robustness [21.485435979018256]
Adrial training suffers from robust overfitting, a phenomenon where the robust test accuracy starts to decrease during training.
We demonstrate that, when combined with model weight averaging, data augmentation can significantly boost robust accuracy.
In particular, against $ell_infty$ norm-bounded perturbations of size $epsilon = 8/255$, our model reaches 60.07% robust accuracy without using any external data.
arXiv Detail & Related papers (2021-11-09T18:57:00Z) - Exploring Sparse Expert Models and Beyond [51.90860155810848]
Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost.
We propose a simple method called expert prototyping that splits experts into different prototypes and applies $k$ top-$1$ routing.
This strategy improves the model quality but maintains constant computational costs, and our further exploration on extremely large-scale models reflects that it is more effective in training larger models.
arXiv Detail & Related papers (2021-05-31T16:12:44Z) - Fixing Data Augmentation to Improve Adversarial Robustness [21.485435979018256]
Adversarial training suffers from robust overfitting, a phenomenon where the robust test accuracy starts to decrease during training.
In this paper, we focus on both adversarials-driven and data-driven augmentations as a means to reduce robust overfitting.
We show large absolute improvements of +7.06% and +5.88% in robust accuracy compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-02T18:58:33Z) - Learnable Boundary Guided Adversarial Training [66.57846365425598]
We use the model logits from one clean model to guide learning of another one robust model.
We achieve new state-of-the-art robustness on CIFAR-100 without additional real or synthetic data.
arXiv Detail & Related papers (2020-11-23T01:36:05Z) - Uncovering the Limits of Adversarial Training against Norm-Bounded
Adversarial Examples [47.27255244183513]
We study the effect of different training losses, model sizes, activation functions, the addition of unlabeled data (through pseudo-labeling) and other factors on adversarial robustness.
We discover that it is possible to train robust models that go well beyond state-of-the-art results by combining larger models, Swish/SiLU activations and model weight averaging.
arXiv Detail & Related papers (2020-10-07T18:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.