Towards Transferable Adversarial Attacks on Vision Transformers
- URL: http://arxiv.org/abs/2109.04176v1
- Date: Thu, 9 Sep 2021 11:28:25 GMT
- Title: Towards Transferable Adversarial Attacks on Vision Transformers
- Authors: Zhipeng Wei, Jingjing Chen, Micah Goldblum, Zuxuan Wu, Tom Goldstein,
Yu-Gang Jiang
- Abstract summary: Vision transformers (ViTs) have demonstrated impressive performance on a series of computer vision tasks, yet they still suffer from adversarial examples.
We introduce a dual attack framework, which contains a Pay No Attention (PNA) attack and a PatchOut attack, to improve the transferability of adversarial samples across different ViTs.
- Score: 110.55845478440807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision transformers (ViTs) have demonstrated impressive performance on a
series of computer vision tasks, yet they still suffer from adversarial
examples. In this paper, we posit that adversarial attacks on transformers
should be specially tailored for their architecture, jointly considering both
patches and self-attention, in order to achieve high transferability. More
specifically, we introduce a dual attack framework, which contains a Pay No
Attention (PNA) attack and a PatchOut attack, to improve the transferability of
adversarial samples across different ViTs. We show that skipping the gradients
of attention during backpropagation can generate adversarial examples with high
transferability. In addition, adversarial perturbations generated by optimizing
randomly sampled subsets of patches at each iteration achieve higher attack
success rates than attacks using all patches. We evaluate the transferability
of attacks on state-of-the-art ViTs, CNNs and robustly trained CNNs. The
results of these experiments demonstrate that the proposed dual attack can
greatly boost transferability between ViTs and from ViTs to CNNs. In addition,
the proposed method can easily be combined with existing transfer methods to
boost performance.
Related papers
- Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers [95.22517830759193]
This paper studies the transferability of such an adversarial vulnerability from a pre-trained ViT model to downstream tasks.
We show that DTA achieves an average attack success rate (ASR) exceeding 90%, surpassing existing methods by a huge margin.
arXiv Detail & Related papers (2024-08-03T08:07:03Z) - Query-Efficient Hard-Label Black-Box Attack against Vision Transformers [9.086983253339069]
Vision transformers (ViTs) face similar security risks from adversarial attacks as deep convolutional neural networks (CNNs)
This article explores the vulnerability of ViTs against adversarial attacks under a black-box scenario.
We propose a novel query-efficient hard-label adversarial attack method called AdvViT.
arXiv Detail & Related papers (2024-06-29T10:09:12Z) - Set-level Guidance Attack: Boosting Adversarial Transferability of
Vision-Language Pre-training Models [52.530286579915284]
We present the first study to investigate the adversarial transferability of vision-language pre-training models.
The transferability degradation is partly caused by the under-utilization of cross-modal interactions.
We propose a highly transferable Set-level Guidance Attack (SGA) that thoroughly leverages modality interactions and incorporates alignment-preserving augmentation with cross-modal guidance.
arXiv Detail & Related papers (2023-07-26T09:19:21Z) - Transferable Adversarial Attacks on Vision Transformers with Token
Gradient Regularization [32.908816911260615]
Vision transformers (ViTs) have been successfully deployed in a variety of computer vision tasks, but they are still vulnerable to adversarial samples.
transfer-based attacks use a local model to generate adversarial samples and directly transfer them to attack a target black-box model.
We propose the Token Gradient Regularization (TGR) method to overcome the shortcomings of existing approaches.
arXiv Detail & Related papers (2023-03-28T06:23:17Z) - Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image
Classification [4.843654097048771]
Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging.
Recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack.
We propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks.
arXiv Detail & Related papers (2022-08-04T19:02:24Z) - Improving the Transferability of Adversarial Examples with Restructure
Embedded Patches [4.476012751070559]
We attack the unique self-attention mechanism in ViTs by restructuring the embedded patches of the input.
Our method generates adversarial examples on white-box ViTs with higher transferability and higher image quality.
arXiv Detail & Related papers (2022-04-27T03:22:55Z) - Deeper Insights into ViTs Robustness towards Common Corruptions [82.79764218627558]
We investigate how CNN-like architectural designs and CNN-based data augmentation strategies impact on ViTs' robustness towards common corruptions.
We demonstrate that overlapping patch embedding and convolutional Feed-Forward Network (FFN) boost performance on robustness.
We also introduce a novel conditional method enabling input-varied augmentations from two angles.
arXiv Detail & Related papers (2022-04-26T08:22:34Z) - Patch-Fool: Are Vision Transformers Always Robust Against Adversarial
Perturbations? [21.32962679185015]
Vision transformers (ViTs) have recently set off a new wave in neural architecture design thanks to their record-breaking performance in vision tasks.
Recent works show that ViTs are more robust against adversarial attacks as compared with convolutional neural networks (CNNs)
We propose a dedicated attack framework, dubbed Patch-Fool, that fools the self-attention mechanism by attacking its basic component.
arXiv Detail & Related papers (2022-03-16T04:45:59Z) - On Improving Adversarial Transferability of Vision Transformers [97.17154635766578]
Vision transformers (ViTs) process input images as sequences of patches via self-attention.
We study the adversarial feature space of ViT models and their transferability.
We introduce two novel strategies specific to the architecture of ViT models.
arXiv Detail & Related papers (2021-06-08T08:20:38Z) - On the Adversarial Robustness of Visual Transformers [129.29523847765952]
This work provides the first and comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations.
Tested on various white-box and transfer attack settings, we find that ViTs possess better adversarial robustness when compared with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-03-29T14:48:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.