Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
- URL: http://arxiv.org/abs/2503.15404v1
- Date: Wed, 19 Mar 2025 16:44:23 GMT
- Title: Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
- Authors: Yuchen Ren, Zhengyu Zhao, Chenhao Lin, Bo Yang, Lu Zhou, Zhe Liu, Chao Shen,
- Abstract summary: We refine two key modules of ViTs: attention maps and token embeddings.<n>For attention maps, we propose Attention Map Diversification (AMD), which diversifies certain attention maps and also implicitly imposes beneficial vanishing during backward propagation.<n>We conduct extensive experiments with adversarial examples transferred from ViTs to various CNNs and ViTs, demonstrating our FPR outperforms the current best (backward) surrogate refinement by up to 7.0% on average.
- Score: 17.496082209866923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision Transformers (ViTs) have been widely applied in various computer vision and vision-language tasks. To gain insights into their robustness in practical scenarios, transferable adversarial examples on ViTs have been extensively studied. A typical approach to improving adversarial transferability is by refining the surrogate model. However, existing work on ViTs has restricted their surrogate refinement to backward propagation. In this work, we instead focus on Forward Propagation Refinement (FPR) and specifically refine two key modules of ViTs: attention maps and token embeddings. For attention maps, we propose Attention Map Diversification (AMD), which diversifies certain attention maps and also implicitly imposes beneficial gradient vanishing during backward propagation. For token embeddings, we propose Momentum Token Embedding (MTE), which accumulates historical token embeddings to stabilize the forward updates in both the Attention and MLP blocks. We conduct extensive experiments with adversarial examples transferred from ViTs to various CNNs and ViTs, demonstrating that our FPR outperforms the current best (backward) surrogate refinement by up to 7.0\% on average. We also validate its superiority against popular defenses and its compatibility with other transfer methods. Codes and appendix are available at https://github.com/RYC-98/FPR.
Related papers
- Multi-Attribute Vision Transformers are Efficient and Robust Learners [4.53923275658276]
Vision Transformers (ViTs) have emerged as a compelling alternative to Convolutional Neural Networks (CNNs)
We present a straightforward yet effective strategy for training various attributes through a single ViT network as distinct tasks.
We assess the resilience of multi-attribute ViTs against adversarial attacks and compare their performance against ViTs designed for single attributes.
arXiv Detail & Related papers (2024-02-12T21:31:13Z) - A Close Look at Spatial Modeling: From Attention to Convolution [70.5571582194057]
Vision Transformers have shown great promise recently for many vision tasks due to the insightful architecture design and attention mechanism.
We generalize self-attention formulation to abstract a queryirrelevant global context directly and integrate the global context into convolutions.
With less than 14M parameters, our FCViT-S12 outperforms related work ResT-Lite by 3.7% top1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-12-23T19:13:43Z) - Self-Distilled Vision Transformer for Domain Generalization [58.76055100157651]
Vision transformers (ViTs) are challenging the supremacy of CNNs on standard benchmarks.
We propose a simple DG approach for ViTs, coined as self-distillation for ViTs.
We empirically demonstrate notable performance gains with different DG baselines and various ViT backbones in five challenging datasets.
arXiv Detail & Related papers (2022-07-25T17:57:05Z) - Improving the Transferability of Adversarial Examples with Restructure
Embedded Patches [4.476012751070559]
We attack the unique self-attention mechanism in ViTs by restructuring the embedded patches of the input.
Our method generates adversarial examples on white-box ViTs with higher transferability and higher image quality.
arXiv Detail & Related papers (2022-04-27T03:22:55Z) - The Principle of Diversity: Training Stronger Vision Transformers Calls
for Reducing All Levels of Redundancy [111.49944789602884]
This paper systematically studies the ubiquitous existence of redundancy at all three levels: patch embedding, attention map, and weight space.
We propose corresponding regularizers that encourage the representation diversity and coverage at each of those levels, that enabling capturing more discriminative information.
arXiv Detail & Related papers (2022-03-12T04:48:12Z) - Towards Transferable Adversarial Attacks on Vision Transformers [110.55845478440807]
Vision transformers (ViTs) have demonstrated impressive performance on a series of computer vision tasks, yet they still suffer from adversarial examples.
We introduce a dual attack framework, which contains a Pay No Attention (PNA) attack and a PatchOut attack, to improve the transferability of adversarial samples across different ViTs.
arXiv Detail & Related papers (2021-09-09T11:28:25Z) - TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation [54.61786380919243]
Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt from a labeled source domain to an unlabeled target domain.
Previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations.
With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge remains unexplored in the literature.
arXiv Detail & Related papers (2021-08-12T22:37:43Z) - On Improving Adversarial Transferability of Vision Transformers [97.17154635766578]
Vision transformers (ViTs) process input images as sequences of patches via self-attention.
We study the adversarial feature space of ViT models and their transferability.
We introduce two novel strategies specific to the architecture of ViT models.
arXiv Detail & Related papers (2021-06-08T08:20:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.