Towards Efficient Adversarial Training on Vision Transformers
- URL: http://arxiv.org/abs/2207.10498v1
- Date: Thu, 21 Jul 2022 14:23:50 GMT
- Title: Towards Efficient Adversarial Training on Vision Transformers
- Authors: Boxi Wu, Jindong Gu, Zhifeng Li, Deng Cai, Xiaofei He, Wei Liu
- Abstract summary: Adversarial training is one of the most effective ways to accomplish robust CNNs.
We propose an efficient Attention Guided Adversarial Training mechanism.
With only 65% of the fast adversarial training time, we match the state-of-the-art results on the challenging ImageNet benchmark.
- Score: 41.6396577241957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision Transformer (ViT), as a powerful alternative to Convolutional Neural
Network (CNN), has received much attention. Recent work showed that ViTs are
also vulnerable to adversarial examples like CNNs. To build robust ViTs, an
intuitive way is to apply adversarial training since it has been shown as one
of the most effective ways to accomplish robust CNNs. However, one major
limitation of adversarial training is its heavy computational cost. The
self-attention mechanism adopted by ViTs is a computationally intense operation
whose expense increases quadratically with the number of input patches, making
adversarial training on ViTs even more time-consuming. In this work, we first
comprehensively study fast adversarial training on a variety of vision
transformers and illustrate the relationship between the efficiency and
robustness. Then, to expediate adversarial training on ViTs, we propose an
efficient Attention Guided Adversarial Training mechanism. Specifically,
relying on the specialty of self-attention, we actively remove certain patch
embeddings of each layer with an attention-guided dropping strategy during
adversarial training. The slimmed self-attention modules accelerate the
adversarial training on ViTs significantly. With only 65\% of the fast
adversarial training time, we match the state-of-the-art results on the
challenging ImageNet benchmark.
Related papers
- MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness [31.603115393528746]
Building robust Vision Transformers (ViTs) is highly dependent on dedicated Adversarial Training (AT) strategies.
We provide a novel theoretical Mutual Information (MI) analysis in its autoencoder-based self-supervised pre-training.
We propose a masked autoencoder-based pre-training method, MIMIR, that employs an MI penalty to facilitate the adversarial training of ViTs.
arXiv Detail & Related papers (2023-12-08T10:50:02Z) - Experts Weights Averaging: A New General Training Scheme for Vision
Transformers [57.62386892571636]
We propose a training scheme for Vision Transformers (ViTs) that achieves performance improvement without increasing inference cost.
During training, we replace some Feed-Forward Networks (FFNs) of the ViT with specially designed, more efficient MoEs.
After training, we convert each MoE into an FFN by averaging the experts, transforming the model back into original ViT for inference.
arXiv Detail & Related papers (2023-08-11T12:05:12Z) - Adaptive Attention Link-based Regularization for Vision Transformers [6.6798113365140015]
We present a regularization technique to improve the training efficiency of Vision Transformers (ViT)
The trainable links are referred to as the attention augmentation module, which is trained simultaneously with ViT.
We can extract the relevant relationship between each CNN activation map and each ViT attention head, and based on this, we also propose an advanced attention augmentation module.
arXiv Detail & Related papers (2022-11-25T01:26:43Z) - When Adversarial Training Meets Vision Transformers: Recipes from
Training to Architecture [32.260596998171835]
Adrial training is still required for ViTs to defend against such adversarial attacks.
We find that pre-training and SGD are necessary for ViTs' adversarial training.
Our code is available at https://versa.com/mo666666/When-Adrial-Training-Meets-Vision-Transformers.
arXiv Detail & Related papers (2022-10-14T05:37:20Z) - A Light Recipe to Train Robust Vision Transformers [34.51642006926379]
We show that Vision Transformers (ViTs) can serve as an underlying architecture for improving the robustness of machine learning models against evasion attacks.
We achieve this objective using a custom adversarial training recipe, discovered using rigorous ablation studies on a subset of the ImageNet dataset.
We show that our recipe generalizes to different classes of ViT architectures and large-scale models on full ImageNet-1k.
arXiv Detail & Related papers (2022-09-15T16:00:04Z) - Patch-Fool: Are Vision Transformers Always Robust Against Adversarial
Perturbations? [21.32962679185015]
Vision transformers (ViTs) have recently set off a new wave in neural architecture design thanks to their record-breaking performance in vision tasks.
Recent works show that ViTs are more robust against adversarial attacks as compared with convolutional neural networks (CNNs)
We propose a dedicated attack framework, dubbed Patch-Fool, that fools the self-attention mechanism by attacking its basic component.
arXiv Detail & Related papers (2022-03-16T04:45:59Z) - Self-slimmed Vision Transformer [52.67243496139175]
Vision transformers (ViTs) have become the popular structures and outperformed convolutional neural networks (CNNs) on various vision tasks.
We propose a generic self-slimmed learning approach for vanilla ViTs, namely SiT.
Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs.
arXiv Detail & Related papers (2021-11-24T16:48:57Z) - Towards Transferable Adversarial Attacks on Vision Transformers [110.55845478440807]
Vision transformers (ViTs) have demonstrated impressive performance on a series of computer vision tasks, yet they still suffer from adversarial examples.
We introduce a dual attack framework, which contains a Pay No Attention (PNA) attack and a PatchOut attack, to improve the transferability of adversarial samples across different ViTs.
arXiv Detail & Related papers (2021-09-09T11:28:25Z) - On the Adversarial Robustness of Visual Transformers [129.29523847765952]
This work provides the first and comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations.
Tested on various white-box and transfer attack settings, we find that ViTs possess better adversarial robustness when compared with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-03-29T14:48:24Z) - Towards Understanding Fast Adversarial Training [91.8060431517248]
We conduct experiments to understand the behavior of fast adversarial training.
We show the key to its success is the ability to recover from overfitting to weak attacks.
arXiv Detail & Related papers (2020-06-04T18:19:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.