Related papers: Towards Efficient Adversarial Training on Vision Transformers

Towards Efficient Adversarial Training on Vision Transformers

URL: http://arxiv.org/abs/2207.10498v1
Date: Thu, 21 Jul 2022 14:23:50 GMT
Title: Towards Efficient Adversarial Training on Vision Transformers
Authors: Boxi Wu, Jindong Gu, Zhifeng Li, Deng Cai, Xiaofei He, Wei Liu
Abstract summary: Adversarial training is one of the most effective ways to accomplish robust CNNs. We propose an efficient Attention Guided Adversarial Training mechanism. With only 65% of the fast adversarial training time, we match the state-of-the-art results on the challenging ImageNet benchmark.
Score: 41.6396577241957
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision Transformer (ViT), as a powerful alternative to Convolutional Neural Network (CNN), has received much attention. Recent work showed that ViTs are also vulnerable to adversarial examples like CNNs. To build robust ViTs, an intuitive way is to apply adversarial training since it has been shown as one of the most effective ways to accomplish robust CNNs. However, one major limitation of adversarial training is its heavy computational cost. The self-attention mechanism adopted by ViTs is a computationally intense operation whose expense increases quadratically with the number of input patches, making adversarial training on ViTs even more time-consuming. In this work, we first comprehensively study fast adversarial training on a variety of vision transformers and illustrate the relationship between the efficiency and robustness. Then, to expediate adversarial training on ViTs, we propose an efficient Attention Guided Adversarial Training mechanism. Specifically, relying on the specialty of self-attention, we actively remove certain patch embeddings of each layer with an attention-guided dropping strategy during adversarial training. The slimmed self-attention modules accelerate the adversarial training on ViTs significantly. With only 65\% of the fast adversarial training time, we match the state-of-the-art results on the challenging ImageNet benchmark.

Related papers

MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness [31.603115393528746]
Building robust Vision Transformers (ViTs) is highly dependent on dedicated Adversarial Training (AT) strategies. We provide a novel theoretical Mutual Information (MI) analysis in its autoencoder-based self-supervised pre-training. We propose a masked autoencoder-based pre-training method, MIMIR, that employs an MI penalty to facilitate the adversarial training of ViTs.
arXiv Detail & Related papers (2023-12-08T10:50:02Z)
Experts Weights Averaging: A New General Training Scheme for Vision Transformers [57.62386892571636]
We propose a training scheme for Vision Transformers (ViTs) that achieves performance improvement without increasing inference cost. During training, we replace some Feed-Forward Networks (FFNs) of the ViT with specially designed, more efficient MoEs. After training, we convert each MoE into an FFN by averaging the experts, transforming the model back into original ViT for inference.
arXiv Detail & Related papers (2023-08-11T12:05:12Z)
Adaptive Attention Link-based Regularization for Vision Transformers [6.6798113365140015]
We present a regularization technique to improve the training efficiency of Vision Transformers (ViT) The trainable links are referred to as the attention augmentation module, which is trained simultaneously with ViT. We can extract the relevant relationship between each CNN activation map and each ViT attention head, and based on this, we also propose an advanced attention augmentation module.
arXiv Detail & Related papers (2022-11-25T01:26:43Z)
When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture [32.260596998171835]
Adrial training is still required for ViTs to defend against such adversarial attacks. We find that pre-training and SGD are necessary for ViTs' adversarial training. Our code is available at https://versa.com/mo666666/When-Adrial-Training-Meets-Vision-Transformers.
arXiv Detail & Related papers (2022-10-14T05:37:20Z)
A Light Recipe to Train Robust Vision Transformers [34.51642006926379]
We show that Vision Transformers (ViTs) can serve as an underlying architecture for improving the robustness of machine learning models against evasion attacks. We achieve this objective using a custom adversarial training recipe, discovered using rigorous ablation studies on a subset of the ImageNet dataset. We show that our recipe generalizes to different classes of ViT architectures and large-scale models on full ImageNet-1k.
arXiv Detail & Related papers (2022-09-15T16:00:04Z)
Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations? [21.32962679185015]
Vision transformers (ViTs) have recently set off a new wave in neural architecture design thanks to their record-breaking performance in vision tasks. Recent works show that ViTs are more robust against adversarial attacks as compared with convolutional neural networks (CNNs) We propose a dedicated attack framework, dubbed Patch-Fool, that fools the self-attention mechanism by attacking its basic component.
arXiv Detail & Related papers (2022-03-16T04:45:59Z)
Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training [29.20567759071523]
Vision Transformers (ViTs) are developing rapidly and starting to challenge the domination of convolutional neural networks (CNNs) in computer vision. This paper introduces CNNs' inductive biases back to ViTs while preserving their network architectures for higher upper bound. Experiments on CIFAR-10/100 and ImageNet-1k with limited training data have shown encouraging results.
arXiv Detail & Related papers (2021-12-07T07:56:50Z)
Self-slimmed Vision Transformer [52.67243496139175]
Vision transformers (ViTs) have become the popular structures and outperformed convolutional neural networks (CNNs) on various vision tasks. We propose a generic self-slimmed learning approach for vanilla ViTs, namely SiT. Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs.
arXiv Detail & Related papers (2021-11-24T16:48:57Z)
Towards Transferable Adversarial Attacks on Vision Transformers [110.55845478440807]
Vision transformers (ViTs) have demonstrated impressive performance on a series of computer vision tasks, yet they still suffer from adversarial examples. We introduce a dual attack framework, which contains a Pay No Attention (PNA) attack and a PatchOut attack, to improve the transferability of adversarial samples across different ViTs.
arXiv Detail & Related papers (2021-09-09T11:28:25Z)
On the Adversarial Robustness of Visual Transformers [129.29523847765952]
This work provides the first and comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations. Tested on various white-box and transfer attack settings, we find that ViTs possess better adversarial robustness when compared with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-03-29T14:48:24Z)
Towards Understanding Fast Adversarial Training [91.8060431517248]
We conduct experiments to understand the behavior of fast adversarial training. We show the key to its success is the ability to recover from overfitting to weak attacks.
arXiv Detail & Related papers (2020-06-04T18:19:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.