Is RobustBench/AutoAttack a suitable Benchmark for Adversarial
Robustness?
- URL: http://arxiv.org/abs/2112.01601v4
- Date: Tue, 20 Feb 2024 13:43:48 GMT
- Title: Is RobustBench/AutoAttack a suitable Benchmark for Adversarial
Robustness?
- Authors: Peter Lorenz, Dominik Strassel, Margret Keuper and Janis Keuper
- Abstract summary: We argue that the alternation of data by AutoAttack with l-inf, eps = 8/255 is unrealistically strong, resulting in close to perfect detection rates of adversarial samples.
We also show that other attack methods are much harder to detect while achieving similar success rates.
- Score: 20.660465258314314
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, RobustBench (Croce et al. 2020) has become a widely recognized
benchmark for the adversarial robustness of image classification networks. In
its most commonly reported sub-task, RobustBench evaluates and ranks the
adversarial robustness of trained neural networks on CIFAR10 under AutoAttack
(Croce and Hein 2020b) with l-inf perturbations limited to eps = 8/255. With
leading scores of the currently best performing models of around 60% of the
baseline, it is fair to characterize this benchmark to be quite challenging.
Despite its general acceptance in recent literature, we aim to foster
discussion about the suitability of RobustBench as a key indicator for
robustness which could be generalized to practical applications. Our line of
argumentation against this is two-fold and supported by excessive experiments
presented in this paper: We argue that I) the alternation of data by AutoAttack
with l-inf, eps = 8/255 is unrealistically strong, resulting in close to
perfect detection rates of adversarial samples even by simple detection
algorithms and human observers. We also show that other attack methods are much
harder to detect while achieving similar success rates. II) That results on
low-resolution data sets like CIFAR10 do not generalize well to higher
resolution images as gradient-based attacks appear to become even more
detectable with increasing resolutions.
Related papers
- Adversarial Robustness Overestimation and Instability in TRADES [4.063518154926961]
TRADES sometimes yields disproportionately high PGD validation accuracy compared to the AutoAttack testing accuracy in the multiclass classification task.
This discrepancy highlights a significant overestimation of robustness for these instances, potentially linked to gradient masking.
arXiv Detail & Related papers (2024-10-10T07:32:40Z) - AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Wasserstein distributional robustness of neural networks [9.79503506460041]
Deep neural networks are known to be vulnerable to adversarial attacks (AA)
For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified.
We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions.
arXiv Detail & Related papers (2023-06-16T13:41:24Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - UNBUS: Uncertainty-aware Deep Botnet Detection System in Presence of
Perturbed Samples [1.2691047660244335]
Botnet detection requires extremely low false-positive rates (FPR), which are not commonly attainable in contemporary deep learning.
In this paper, two LSTM-based classification algorithms for botnet classification with an accuracy higher than 98% are presented.
arXiv Detail & Related papers (2022-04-18T21:49:14Z) - Detecting AutoAttack Perturbations in the Frequency Domain [18.91242463856906]
adversarial attacks on image classification networks by the AutoAttack framework have drawn a lot of attention.
In this paper, we investigate the spatial and frequency domain properties of AutoAttack and propose an alternative defense.
Instead of hardening a network, we detect adversarial attacks during inference, rejecting manipulated inputs.
arXiv Detail & Related papers (2021-11-16T21:20:37Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation.
We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks.
We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.