Adversarial training may be a double-edged sword
- URL: http://arxiv.org/abs/2107.11671v1
- Date: Sat, 24 Jul 2021 19:09:16 GMT
- Title: Adversarial training may be a double-edged sword
- Authors: Ali Rahmati, Seyed-Mohsen Moosavi-Dezfooli, Huaiyu Dai
- Abstract summary: We show that some geometric consequences of adversarial training on the decision boundary of deep networks give an edge to certain types of black-box attacks.
In particular, we define a metric called robustness gain to show that while adversarial training is an effective method to dramatically improve the robustness in white-box scenarios, it may not provide such a good robustness gain against the more realistic decision-based black-box attacks.
- Score: 50.09831237090801
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial training has been shown as an effective approach to improve the
robustness of image classifiers against white-box attacks. However, its
effectiveness against black-box attacks is more nuanced. In this work, we
demonstrate that some geometric consequences of adversarial training on the
decision boundary of deep networks give an edge to certain types of black-box
attacks. In particular, we define a metric called robustness gain to show that
while adversarial training is an effective method to dramatically improve the
robustness in white-box scenarios, it may not provide such a good robustness
gain against the more realistic decision-based black-box attacks. Moreover, we
show that even the minimal perturbation white-box attacks can converge faster
against adversarially-trained neural networks compared to the regular ones.
Related papers
- Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - Understanding the Robustness of Randomized Feature Defense Against
Query-Based Adversarial Attacks [23.010308600769545]
Deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify.
We propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time.
Our method effectively enhances the model's resilience against both score-based and decision-based black-box attacks.
arXiv Detail & Related papers (2023-10-01T03:53:23Z) - Saliency Attack: Towards Imperceptible Black-box Adversarial Attack [35.897117965803666]
We propose to restrict perturbations to a small salient region to generate adversarial examples that can hardly be perceived.
We also propose the Saliency Attack, a new black-box attack aiming to refine the perturbations in the salient region to achieve even better imperceptibility.
arXiv Detail & Related papers (2022-06-04T03:56:07Z) - Boosting Black-Box Adversarial Attacks with Meta Learning [0.0]
We propose a hybrid attack method which trains meta adversarial perturbations (MAPs) on surrogate models and performs black-box attacks by estimating gradients of the models.
Our method can not only improve the attack success rates, but also reduces the number of queries compared to other methods.
arXiv Detail & Related papers (2022-03-28T09:32:48Z) - Parallel Rectangle Flip Attack: A Query-based Black-box Attack against
Object Detection [89.08832589750003]
We propose a Parallel Rectangle Flip Attack (PRFA) via random search to avoid sub-optimal detection near the attacked region.
Our method can effectively and efficiently attack various popular object detectors, including anchor-based and anchor-free, and generate transferable adversarial examples.
arXiv Detail & Related papers (2022-01-22T06:00:17Z) - Saliency Diversified Deep Ensemble for Robustness to Adversaries [1.9659095632676094]
This work proposes a novel diversity-promoting learning approach for the deep ensembles.
The idea is to promote saliency map diversity (SMD) on ensemble members to prevent the attacker from targeting all ensemble members at once.
We empirically show a reduced transferability between ensemble members and improved performance compared to the state-of-the-art ensemble defense.
arXiv Detail & Related papers (2021-12-07T10:18:43Z) - Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one.
We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models.
Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z) - Robustness Out of the Box: Compositional Representations Naturally
Defend Against Black-Box Patch Attacks [11.429509031463892]
Patch-based adversarial attacks introduce a perceptible but localized change to the input that induces misclassification.
In this work, we study two different approaches for defending against black-box patch attacks.
We find that adversarial training has limited effectiveness against state-of-the-art location-optimized patch attacks.
arXiv Detail & Related papers (2020-12-01T15:04:23Z) - AdvMind: Inferring Adversary Intent of Black-Box Attacks [66.19339307119232]
We present AdvMind, a new class of estimation models that infer the adversary intent of black-box adversarial attacks in a robust manner.
On average AdvMind detects the adversary intent with over 75% accuracy after observing less than 3 query batches.
arXiv Detail & Related papers (2020-06-16T22:04:31Z) - Spanning Attack: Reinforce Black-box Attacks with Unlabeled Data [96.92837098305898]
Black-box attacks aim to craft adversarial perturbations by querying input-output pairs of machine learning models.
Black-box attacks often suffer from the issue of query inefficiency due to the high dimensionality of the input space.
We propose a novel technique called the spanning attack, which constrains adversarial perturbations in a low-dimensional subspace via spanning an auxiliary unlabeled dataset.
arXiv Detail & Related papers (2020-05-11T05:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.