Mind the box: $l_1$-APGD for sparse adversarial attacks on image
classifiers
- URL: http://arxiv.org/abs/2103.01208v3
- Date: Fri, 24 Nov 2023 15:41:48 GMT
- Title: Mind the box: $l_1$-APGD for sparse adversarial attacks on image
classifiers
- Authors: Francesco Croce, Matthias Hein
- Abstract summary: We study the expected sparsity of the steepest descent step for this effective threat model.
We propose an adaptive form of PGD which is highly effective even with a small budget of iterations.
- Score: 61.46999584579775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that when taking into account also the image domain $[0,1]^d$,
established $l_1$-projected gradient descent (PGD) attacks are suboptimal as
they do not consider that the effective threat model is the intersection of the
$l_1$-ball and $[0,1]^d$. We study the expected sparsity of the steepest
descent step for this effective threat model and show that the exact projection
onto this set is computationally feasible and yields better performance.
Moreover, we propose an adaptive form of PGD which is highly effective even
with a small budget of iterations. Our resulting $l_1$-APGD is a strong
white-box attack showing that prior works overestimated their $l_1$-robustness.
Using $l_1$-APGD for adversarial training we get a robust classifier with SOTA
$l_1$-robustness. Finally, we combine $l_1$-APGD and an adaptation of the
Square Attack to $l_1$ into $l_1$-AutoAttack, an ensemble of attacks which
reliably assesses adversarial robustness for the threat model of $l_1$-ball
intersected with $[0,1]^d$.
Related papers
- Stochastic Bandits Robust to Adversarial Attacks [33.278131584647745]
This paper investigates multi-armed bandit algorithms that are robust to adversarial attacks.
We study two cases of this model, with or without the knowledge of an attack budget $C$.
We devise two types of algorithms with regret bounds having additive or multiplicative $C$ dependence terms.
arXiv Detail & Related papers (2024-08-16T17:41:35Z) - Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit
Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses.
We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z) - $σ$-zero: Gradient-based Optimization of $\ell_0$-norm Adversarial Examples [14.17412770504598]
We show that $ell_infty$-norm constraints can be used to craft input perturbations.
We propose a novel $ell_infty$-norm attack called $sigma$-norm.
It outperforms all competing adversarial attacks in terms of success, size, and efficiency.
arXiv Detail & Related papers (2024-02-02T20:08:11Z) - Class-Conditioned Transformation for Enhanced Robust Image Classification [19.738635819545554]
We propose a novel test-time threat model algorithm that enhances Adrial-versa-Trained (AT) models.
Our method operates through COnditional image transformation and DIstance-based Prediction (CODIP)
The proposed method achieves state-of-the-art results demonstrated through extensive experiments on various models, AT methods, datasets, and attack types.
arXiv Detail & Related papers (2023-03-27T17:28:20Z) - Adversarial robustness against multiple $l_p$-threat models at the price
of one and how to quickly fine-tune robust models to another threat model [79.05253587566197]
Adrial training (AT) in order to achieve adversarial robustness wrt single $l_p$-threat models has been discussed extensively.
In this paper we develop a simple and efficient training scheme to achieve adversarial robustness against the union of $l_p$-threat models.
arXiv Detail & Related papers (2021-05-26T12:20:47Z) - Improving Robustness and Generality of NLP Models Using Disentangled
Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$.
We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning.
We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - Toward Adversarial Robustness via Semi-supervised Robust Training [93.36310070269643]
Adrial examples have been shown to be the severe threat to deep neural networks (DNNs)
We propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_stand$ and $R_rob$)
arXiv Detail & Related papers (2020-03-16T02:14:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.