Related papers: Attacking Transformers with Feature Diversity Adversarial Perturbation

Attacking Transformers with Feature Diversity Adversarial Perturbation

URL: http://arxiv.org/abs/2403.07942v1
Date: Sun, 10 Mar 2024 00:55:58 GMT
Title: Attacking Transformers with Feature Diversity Adversarial Perturbation
Authors: Chenxing Gao, Hang Zhou, Junqing Yu, YuTeng Ye, Jiale Cai, Junle Wang, Wei Yang
Abstract summary: We present a label-free white-box attack approach for ViT-based models that exhibits strong transferability to various black box models. Our inspiration comes from the feature collapse phenomenon in ViTs, where the critical attention mechanism overly depends on the low-frequency component of features.
Score: 19.597912600568026
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding the mechanisms behind Vision Transformer (ViT), particularly its vulnerability to adversarial perturba tions, is crucial for addressing challenges in its real-world applications. Existing ViT adversarial attackers rely on la bels to calculate the gradient for perturbation, and exhibit low transferability to other structures and tasks. In this paper, we present a label-free white-box attack approach for ViT-based models that exhibits strong transferability to various black box models, including most ViT variants, CNNs, and MLPs, even for models developed for other modalities. Our inspira tion comes from the feature collapse phenomenon in ViTs, where the critical attention mechanism overly depends on the low-frequency component of features, causing the features in middle-to-end layers to become increasingly similar and eventually collapse. We propose the feature diversity attacker to naturally accelerate this process and achieve remarkable performance and transferability.

Related papers

Towards Highly Transferable Vision-Language Attack via Semantic-Augmented Dynamic Contrastive Interaction [67.45032003041399]
We propose a Semantic-Augmented Dynamic Contrastive Attack (SADCA) that enhances adversarial transferability through progressive and semantically guided perturbations.<n>SADCA establishes a contrastive learning mechanism involving adversarial, positive and negative samples, to reinforce the semantic inconsistency of the obtained perturbations.<n>Experiments on multiple datasets and models demonstrate that SADCA significantly improves adversarial transferability and consistently surpasses state-of-the-art methods.
arXiv Detail & Related papers (2026-03-05T05:46:16Z)
The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability [38.32538271219404]
We investigate the role of computational redundancy in Vision Transformers (ViTs) and its impact on adversarial transferability. We identify two forms of redundancy, including the data-level and model-level, that can be harnessed to amplify attack effectiveness. Building on this insight, we design a suite of techniques, including attention sparsity manipulation, attention head permutation, clean token regularization, ghost MoE diversification, and test-time adversarial training.
arXiv Detail & Related papers (2025-04-15T01:59:47Z)
Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers [1.1187085721899017]
We study the sources of known representation vulnerabilities of vision transformers (ViT), where perceptually identical images can have very different representations. We develop NeuroShield-ViT, a novel defense mechanism that strategically neutralizes vulnerable neurons in earlier layers to prevent the cascade of adversarial effects. Our results shed new light on how adversarial effects propagate through ViT layers, while providing a promising approach to enhance the robustness of vision transformers against adversarial attacks.
arXiv Detail & Related papers (2025-02-07T05:58:16Z)
PartFormer: Awakening Latent Diverse Representation from Vision Transformer for Object Re-Identification [73.64560354556498]
Vision Transformer (ViT) tends to overfit on most distinct regions of training data, limiting its generalizability and attention to holistic object features. We present PartFormer, an innovative adaptation of ViT designed to overcome the limitations in object Re-ID tasks. Our framework significantly outperforms state-of-the-art by 2.4% mAP scores on the most challenging MSMT17 dataset.
arXiv Detail & Related papers (2024-08-29T16:31:05Z)
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models [52.530286579915284]
We present the first study to investigate the adversarial transferability of vision-language pre-training models. The transferability degradation is partly caused by the under-utilization of cross-modal interactions. We propose a highly transferable Set-level Guidance Attack (SGA) that thoroughly leverages modality interactions and incorporates alignment-preserving augmentation with cross-modal guidance.
arXiv Detail & Related papers (2023-07-26T09:19:21Z)
Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization [32.908816911260615]
Vision transformers (ViTs) have been successfully deployed in a variety of computer vision tasks, but they are still vulnerable to adversarial samples. transfer-based attacks use a local model to generate adversarial samples and directly transfer them to attack a target black-box model. We propose the Token Gradient Regularization (TGR) method to overcome the shortcomings of existing approaches.
arXiv Detail & Related papers (2023-03-28T06:23:17Z)
Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification [4.843654097048771]
Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging. Recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack. We propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks.
arXiv Detail & Related papers (2022-08-04T19:02:24Z)
Improving the Transferability of Adversarial Examples with Restructure Embedded Patches [4.476012751070559]
We attack the unique self-attention mechanism in ViTs by restructuring the embedded patches of the input. Our method generates adversarial examples on white-box ViTs with higher transferability and higher image quality.
arXiv Detail & Related papers (2022-04-27T03:22:55Z)
Deeper Insights into ViTs Robustness towards Common Corruptions [82.79764218627558]
We investigate how CNN-like architectural designs and CNN-based data augmentation strategies impact on ViTs' robustness towards common corruptions. We demonstrate that overlapping patch embedding and convolutional Feed-Forward Network (FFN) boost performance on robustness. We also introduce a novel conditional method enabling input-varied augmentations from two angles.
arXiv Detail & Related papers (2022-04-26T08:22:34Z)
On Improving Adversarial Transferability of Vision Transformers [97.17154635766578]
Vision transformers (ViTs) process input images as sequences of patches via self-attention. We study the adversarial feature space of ViT models and their transferability. We introduce two novel strategies specific to the architecture of ViT models.
arXiv Detail & Related papers (2021-06-08T08:20:38Z)
Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems. We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN) We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z)
On the Adversarial Robustness of Visual Transformers [129.29523847765952]
This work provides the first and comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations. Tested on various white-box and transfer attack settings, we find that ViTs possess better adversarial robustness when compared with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-03-29T14:48:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.